melff / memisc

Tools for Managing Survey Data, Creating Tables of Estimates and Data Summaries
https://melff.github.io/memisc
45 stars 8 forks source link

Problem importing string variables from SPSS (.sav) #26

Closed d4ff closed 6 years ago

d4ff commented 7 years ago

Hi,

I'm only experiencing this issue on my Linux PC, the same script runs without trouble on Windows.

I'm importing a survey dataset, there are a couple of string variables in the dataset. The string variables mostly contain numbers, eg "1", "2", "-66", but some of them contain text-based answers as well, eg. "Please don't send me more surveys".

I've now noticed that only the string variables with absolutely no text seem to be working properly in memisc on Linux. is.character(ds$var) returns TRUE, and it can be coerced into numeric without errors. The variables with values containing text on the other hand will give errors:

>ds$problemvar
Item 'blablabla variable label blablabla' (measurement: nominal, type: character, length = 1729)

Error in if (any(xw > width)) { : missing value where TRUE/FALSE needed
> str(ds$problemvar)
 Nmnl. item  chr [1:1729] "-66

                                                  "|
 __truncated__ ...

It appears that some form of truncation is happening. Here is what it looks like when indexing the column:

> ds[1]

Data set with 1729 observations and 1 variables

   ...
 1 ...
 2 ...
 3 ...
 4 ...
 5 ...
 6 ...
 7 ...
 8 ...
 9 ...
10 ...

While another variable, based on a near identical survey question, works fine:

> str(ds$noproblemvar)
 Nmnl. item  chr [1:1729] "-66" "-66" "-66" "-66" ...

I have been comparing the above variables every which way, both in SPSS and R; the only discernible difference is that one of them, while being exported from the survey software as a string variable because text input was allowed, only contains numbers.

I'm importing the data from .sav files like so:

in_file = suppressWarnings(
    spss.system.file(
        file.path(use_dir, sav_file, fsep= .Platform$file.sep)))

ds = as.data.set(in_file)

Anyways, thanks for making memisc. My script works on windows so I can still make use of it, but it would be nice to figure out a workaround so I can handle these datasets in Linux as well. I can send you a .sav dataset to troubleshoot with if that helps.

melff commented 7 years ago

Thanks for reporting the bug and sorry that I did not come back to it earlier. I was a bit busy the last couple of weeks and also somewhat reluctant to react to anonymous requests.

Can you please send me a .sav file to troubleshoot? Thanks!

siames3 commented 6 years ago

Hello,

First of all, thank you very much for memisc, it is a great tool.

Any news on this issue? It is happening to me too: string variables in the original SPSS file multiply into several dataset items:

$ q7_other : Nmnl. item w/ 2 labels for 0,1 num 0 0 0 0 0 0 0 0 0 0 ... $ q7_other_specify : Nmnl. item chr " "| truncated " "| truncated " "| truncated " "| truncated ... $ q7_ot0 : Nmnl. item chr " "| truncated " "| truncated " "| truncated " "| truncated ... $ q7_ot1 : Nmnl. item chr " "| truncated " "| truncated " "| truncated " "| truncated ... $ q7_ot2 : Nmnl. item chr " "| truncated " "| truncated " "| truncated " "| truncated ...

Thank you very much!

melff commented 6 years ago

This is likely to be a bug. Could you send me an example .sav file? This would help identifying and fixing it. Thanks!

melff commented 6 years ago

Neither of the people who reported the bug provided me with example data. I created an SPSS data file (with PSPP) that somewhat corresponds to the data described above, but was not able to reproduce the problem described. Given that the symptoms only appear under specific OSes, I suspect that they are the effect of some pointer mishandling that was corrected in one of the recent releases.

Since I did not receive any responses to my requests for example data, I declare this issue closed.