ralhei / pyRserve

A python client for Rserve (network layer to remote R server)
Other
47 stars 13 forks source link

TypeError with objects from the R survey package #11

Open rjmorris opened 8 years ago

rjmorris commented 8 years ago

I'm trying to use pyRserve to retrieve objects returned by functions in R's survey package. However, for some objects I get the following error:

... File "c:\programs\Anaconda3\lib\site-packages\pyRserve\rparser.py", line 547, in xt_array data.attr[tag] = value TypeError: list indices must be integers, not str

I can successfully retrieve other survey objects from my R program, so I don't think it's an error in how I'm writing the code. Here's an example of the structure of one of the objects I'm having trouble with:

> str(rowper_marg)
Class 'svystat'  atomic [1:4] 0.0216 0.0504 0.2164 0.7116
  ..- attr(*, "var")= num [1:4, 1:4] 1.14e-06 -2.25e-07 -4.29e-08 -8.77e-07 -2.25e-07 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:4] "RSKPKCIG(1) No risk" "RSKPKCIG(2) Slight risk" "RSKPKCIG(3) Moderate risk" "RSKPKCIG(4) Great risk"
  .. .. ..$ : chr [1:4] "RSKPKCIG(1) No risk" "RSKPKCIG(2) Slight risk" "RSKPKCIG(3) Moderate risk" "RSKPKCIG(4) Great risk"
  ..- attr(*, "statistic")= chr "mean"

I can work around the error by using accessor functions provided by survey to return simpler objects which do work with pyRserve. For example, I can use the coef function to get just the coefficients stored in the svystat object. The structure of that object in R looks like:

> str(coef(rowper_marg))
 Named num [1:4] 0.0216 0.0504 0.2164 0.7116
 - attr(*, "names")= chr [1:4] "RSKPKCIG(1) No risk" "RSKPKCIG(2) Slight risk" "RSKPKCIG(3) Moderate risk" "RSKPKCIG(4) Great risk"

And when I pull this in through pyRserve, the resulting Python object looks like:

TaggedArray([ 0.02155724,  0.05039401,  0.21641401,  0.71163474], key=['RSKPKCIG(1) No risk', 'RSKPKCIG(2) Slight risk', 'RSKPKCIG(3) Moderate risk', 'RSKPKCIG(4) Great risk'])

I'm using pyRserve 0.8.4 on Windows 7 with R 3.1.2. I could test on Linux if you think it would make a difference. I'm happy to provide additional information if necessary.

ralhei commented 8 years ago

Seems like your data structure is a case I've never seen so far. I doubt that it would help to run it on Linux. Could you send me a portion of R code that would setup such a structure so that I can replicate the problem here on my developer machine? Otherwise I will have a hard time to track the problem down.

rjmorris commented 8 years ago

Thanks for taking a look at this. The following code should produce an object with that structure:

install.packages("survey")
library(survey)

set.seed(237686251)

## Create a sample dataset.
data = data.frame(wt = runif(100, 1, 1000))
data$stratum = 1:10
data$stratum = sort(data$stratum)
data$cluster = 1:2
data$group = factor(sample(4, size = nrow(data), replace = TRUE))

## Create the survey "design" object.
design = svydesign(
    data = data,
    strata = ~stratum,
    id = ~cluster,
    weights = ~wt,
    nest = TRUE)

## Compute the frequency distribution for the 'group' variable.
distrib = svymean(
    x = ~group,
    design = design)

## Examine the resulting object.
print(distrib)
str(distrib)