tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

Unable to allocate memory — haven 1.1.1 #342

Closed BERENZ closed 6 years ago

BERENZ commented 6 years ago

I have the following problem with read_spss/read_sav. Code work perfectly for previous version of haven package but after updating I got an an error which indicate a problem with allocating memory.

I attach code, the file and other information.

I tested this on two R versions (R MRAN & vanilla 3.4.0 and R 3.3.2 both on macOS Sierra 10.12.1). This problem is may be related only to sav files. I got no errors for reading sas7bdat files.

EDIT: I tested on Windows 7 x64 (build 7601) Service Pack 1 with R 3.3.2 and got the same error.

> library(haven)
> unzip('BKL_oferty_pracy_1ed.sav.zip')
> dt <- read_spss('BKL_oferty_pracy_1ed.sav')
Błąd w poleceniu 'df_parse_sav_file(spec, user_na)':
  Failed to parse BKL_oferty_pracy_1ed.sav: Unable to allocate memory.
> traceback()
5: stop(list(message = "Failed to parse BKL_oferty_pracy_1ed.sav: Unable to allocate memory.", 
       call = df_parse_sav_file(spec, user_na), cppstack = list(
           file = "", line = -1L, stack = c("1   haven.so                            0x000000010c588fc4 _ZN4Rcpp9exceptionC2EPKcb + 276", 
           "2   haven.so                            0x000000010c58b523 _ZN4Rcpp4stopIJNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEEPKcEEEvS9_DpOT_ + 83", 
           "3   haven.so                            0x000000010c5847a1 _Z13df_parse_spssI17DfReaderInputFileEN4Rcpp6VectorILi19ENS1_15PreserveStorageEEES4_bb + 833", 
           "4   haven.so                            0x000000010c5843fe _Z17df_parse_sav_fileN4Rcpp6VectorILi19ENS_15PreserveStorageEEEb + 46", 
           "5   haven.so                            0x000000010c596054 _haven_df_parse_sav_file + 148", 
           "6   libR.dylib                          0x0000000109525bb3 do_dotcall + 387", 
           "7   libR.dylib                          0x000000010955788f Rf_eval + 1823", 
           "8   libR.dylib                          0x00000001095976ab do_begin + 475", 
           "9   libR.dylib                          0x0000000109557571 Rf_eval + 1025", 
           "10  libR.dylib                          0x00000001095954f6 R_execClosure + 870", 
           "11  libR.dylib                          0x000000010955770f Rf_eval + 1439", 
           "12  libR.dylib                          0x00000001094e153e do_switch + 1502", 
           "13  libR.dylib                          0x0000000109557571 Rf_eval + 1025", 
           "14  libR.dylib                          0x00000001095976ab do_begin + 475", 
           "15  libR.dylib                          0x0000000109557571 Rf_eval + 1025", 
           "16  libR.dylib                          0x00000001095954f6 R_execClosure + 870", 
           "17  libR.dylib                          0x000000010955770f Rf_eval + 1439", 
           "18  libR.dylib                          0x00000001094e153e do_switch + 1502", 
           "19  libR.dylib                          0x0000000109557571 Rf_eval + 1025", 
           "20  libR.dylib                          0x00000001095976ab do_begin + 475", 
           "21  libR.dylib                          0x0000000109557571 Rf_eval + 1025", 
           "22  libR.dylib                          0x00000001095954f6 R_execClosure + 870", 
           "23  libR.dylib                          0x000000010955770f Rf_eval + 1439", 
           "24  libR.dylib                          0x0000000109597b79 do_set + 153", 
           "25  libR.dylib                          0x0000000109557571 Rf_eval + 1025", 
           "26  libR.dylib                          0x00000001095ce128 Rf_ReplIteration + 792", 
           "27  libR.dylib                          0x00000001095cf725 R_ReplConsole + 149", 
           "28  libR.dylib                          0x00000001095cf652 run_Rmainloop + 82", 
           "29  R                                   0x00000001094a1f5b main + 27", 
           "30  libdyld.dylib                       0x00007fffc488d255 start + 1", 
           "31  ???                                 0x0000000000000001 0x0 + 1"
           ))))
4: .Call(`_haven_df_parse_sav_file`, spec, user_na)
3: df_parse_sav_file(spec, user_na)
2: read_sav(file, user_na = user_na)
1: read_spss("BKL_oferty_pracy_1ed.sav")

SessionInfo

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] haven_1.1.1

loaded via a namespace (and not attached):
 [1] readr_1.1.1      compiler_3.4.0   R6_2.2.2         magrittr_1.5    
 [5] RevoUtils_10.0.4 hms_0.4.0        tools_3.4.0      pillar_1.1.0    
 [9] tibble_1.4.2     Rcpp_0.12.15     forcats_0.2.0    pkgconfig_2.0.1 
[13] rlang_0.1.6  

Also tested on

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin11.4.2 (64-bit)
Running under: macOS Sierra 10.12.1

locale:
[1] en_US/UTF-8/en_US/C/en_US/en_US

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] haven_1.1.1

loaded via a namespace (and not attached):
[1] readr_1.1.1   R6_2.2.0      magrittr_1.5  hms_0.3       tools_3.3.2  
[6] tibble_1.3.1  Rcpp_0.12.11  forcats_0.2.0 rlang_0.1.1 

EDIT: Windows machine

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] haven_1.1.1

loaded via a namespace (and not attached):
 [1] readr_1.1.1     R6_2.2.2        magrittr_1.5    hms_0.4.0       tools_3.3.2     pillar_1.1.0   
 [7] tibble_1.4.2    Rcpp_0.12.15    forcats_0.2.0   pkgconfig_2.0.1 rlang_0.1.6 
> dane <- read_spss(BKL_oferty_pracy_1ed.sav)
Error in df_parse_sav_file(spec, user_na) : 
  Failed to parse BKL_oferty_pracy_1ed.sav: Unable to allocate memory.
> traceback()
5: stop(list(message = "Failed to parse BKL_oferty_pracy_1ed.sav: Unable to allocate memory.", 
       call = df_parse_sav_file(spec, user_na), cppstack = list(
           file = "", line = -1L, stack = "C++ stack not available on this system")))
4: .Call(`_haven_df_parse_sav_file`, spec, user_na)
3: df_parse_sav_file(spec, user_na)
2: read_sav(file, user_na = user_na)
1: read_spss(fname)
strengejacke commented 6 years ago

I have a similar problem since the latest haven-update, here's my error msg:

 Error in df_parse_sav_file(spec, user_na) : 
  Failed to parse C:/Users/Daniel/Documents/Projekte/2014 DAVID2/Datenerhebung/Dateneingabe/Patienten/Pat_Dateneingabe20170913_pa.sav: Unable to allocate memory. 
5.
stop(structure(list(message = "Failed to parse C:/Users/Daniel/Documents/Projekte/2014 DAVID2/Datenerhebung/Dateneingabe/Patienten/Pat_Dateneingabe20170913_pa.sav: Unable to allocate memory.", 
    call = df_parse_sav_file(spec, user_na), cppstack = structure(list(
        file = "", line = -1L, stack = "C++ stack not available on this system"), .Names = c("file", 
    "line", "stack"), class = "Rcpp_stack_trace")), .Names = c("message",  ... 
4.
df_parse_sav_file(spec, user_na) 
3.
read_sav(file, user_na = user_na) 
2.
haven::read_spss(file = path, user_na = tag.na) at read_write.R#55
1.
read_spss("../Pat_Dateneingabe20170913_pa.sav") 
dicorynia commented 6 years ago

Same issue with 1.1.1 (R 3.4.0 on win7). Reverting to 1.1.0 solves the problem. (found the old haven_1.1.0.zip on https://mran.microsoft.com/snapshot/2017-11-01/bin/windows/contrib/3.5/)

strengejacke commented 6 years ago

Yes, downgrading to 1.1.0 works. Older pkg versions can be found on CRAN as well, there's an archive link: https://cran.r-project.org/src/contrib/Archive/haven/

dicorynia commented 6 years ago

Yes, thanks. I found them but they are not compiled and so they require Rtools, which may not be convenient (and install_version has issue with our proxy)... :-) The *.zip are easier to install on Windows...

philmikejones commented 6 years ago

I also have the same problem. Unfortunately I can't share the dataset that caused the issue, and I'm hunting for a similar open data set that replicates the problem, but downgrading to haven v1.1.0 solved the issue for me:

devtools::install_version("haven", version = "1.1.0")

Update: It seemed to be a problem of the size of the file. As I was halving the data set to see which 'half' the problem was related to I was able to load the file regardless of how I sliced it when I removed enough columns or cases, which seems to fit with the error message. Very strange that the same file loads with v1.1.0.

JoachimTanner commented 6 years ago

I have the same problem (R 3.4.3 on Win10). read_sav leads to an error (see below) in haven 1.1.1, downgrading to v1.1.0 solves the issue. In my scenario, the error only shows when the database has more than env. 2000 variables.

Error in df_parse_sav_file(spec, user_na) : Failed to parse J:/Database_XY-Study_II.sav: Unable to allocate memory.

strengejacke commented 6 years ago

I don't think it's an issue of very large files. My dataset had ~ 600 observations and ~ 300 variables. That said, it would be very strange if you don't run into this issue for datasets with < 2000 columns?

JoachimTanner commented 6 years ago

I don't have any idea about the exact reason of the issue, but reducing my dataset from 5845 to 1970 variables (160 observations in each case) made my script run again. I thought this information could maybe be helpful. It doesn't seem to be an issue basing only on the number of variables. The running script when having only 1970 variables could somehow be related with deletion of variables, length of variable names or contents, specific characters etc.

EricGoldsmith commented 6 years ago

I'll add my voice here. Haven v1.1.1 generates memory allocation error, while v1.1.0 does not, for same file.

maxwell8888 commented 6 years ago

Use this to revert to previous working version: remove.packages("haven") devtools::install_version("haven", version = "1.1.0", repos = "http://cran.us.r-project.org")

dylanjm commented 6 years ago

Any update on this issue? Having the same problem with not a very large .sav file at all. Downgraded to 1.1.0 and it works now.

oloverm commented 6 years ago

I have the exact same issue, also solved it by reverting to 1.1.0 (downloading from CRAN didn't work, but the link @dicorynia provided did).

bdemeshev commented 6 years ago

We have exactly the same issue with the hometask in Econometrics course at Coursera /in Russian/! Students are required to use RLMS data, that can not be now handled by haven 1.1.1.

Our solutions:

hadley commented 6 years ago

Can you please try the latest development version? I can read the file attached to the initial issue without problems.

EricGoldsmith commented 6 years ago

Works for me. 👍

ajdamico commented 6 years ago

sorry, please re-open this issue, i'm not sure it's fixed in the development version? here's a minimal reproducible block that works on v1.1.0 and fails with both cran and with the current dev version:

tf <- tempfile()
download.file( "https://assets.pewresearch.org/wp-content/uploads/sites/5/datasets/Sept07.zip" , tf , mode = 'wb' )
z <- unzip( tf , exdir = tempdir() )
x <- haven::read_sav( grep( "\\.sav$" , z , value = TRUE ) )

same failure in dev.. doesn't fail with v1.1.0

# Error in df_parse_sav_file(spec, encoding, user_na) : 
  # Failed to parse C:/Users/AnthonyD/AppData/Local/Temp/Rtmpao8ZKs/Sept07c.sav: Unable to allocate memory.

sessionInfo for run that works:

devtools::install_github("tidyverse/haven", ref = "v1.1.0")
library(haven)
sessionInfo()
# R version 3.4.3 (2017-11-30)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

# Matrix products: default

# locale:
# [1] LC_COLLATE=English_United States.1252 
# [2] LC_CTYPE=English_United States.1252   
# [3] LC_MONETARY=English_United States.1252
# [4] LC_NUMERIC=C                          
# [5] LC_TIME=English_United States.1252    

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] haven_1.1.0

# loaded via a namespace (and not attached):
# [1] compiler_3.4.3 magrittr_1.5   pillar_1.1.0   tibble_1.4.2   Rcpp_0.12.15  
# [6] forcats_0.2.0  rlang_0.1.4   

sessionInfo for version that fails:

devtools::install_github("tidyverse/haven")
library(haven)
sessionInfo()
# R version 3.4.3 (2017-11-30)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

# Matrix products: default

# locale:
# [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
# [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
# [5] LC_TIME=English_United States.1252    

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] haven_1.1.1.9000

# loaded via a namespace (and not attached):
 # [1] Rcpp_0.12.15    digest_0.6.15   withr_2.1.1     R6_2.2.2        git2r_0.21.0    magrittr_1.5    pillar_1.1.0   
 # [8] httr_1.3.1      rlang_0.1.4     curl_3.1        devtools_1.13.4 forcats_0.2.0   tools_3.4.3     compiler_3.4.3 
# [15] memoise_1.1.0   knitr_1.19      tibble_1.4.2   
hadley commented 6 years ago

@evanmiller can you please take another look?

@ajdamico in the future, please don't include sessionInfo() unless it's specifically requested — it's not usually useful.

ajdamico commented 6 years ago

will do, thanks. here are more examples of the problem.. same failure on cran and dev but success with 1.1.0

tf <- tempfile()
download.file( "http://assets.pewresearch.org/wp-content/uploads/sites/5/datasets/Iraq2003-2.zip" , tf , mode = 'wb' )
z <- unzip( tf , exdir = tempdir() )
x <- haven::read_sav( grep( "\\.sav$" , z , value = TRUE ) )

tf <- tempfile()
download.file( "http://assets.pewresearch.org/wp-content/uploads/sites/5/datasets/Oct01NII.zip" , tf , mode = 'wb' )
z <- unzip( tf , exdir = tempdir() )
x <- haven::read_sav( grep( "\\.sav$" , z , value = TRUE ) )

tf <- tempfile()
download.file( "http://assets.pewresearch.org/wp-content/uploads/sites/5/datasets/april01nii.zip" , tf , mode = 'wb' )
z <- unzip( tf , exdir = tempdir() )
x <- haven::read_sav( grep( "\\.sav$" , z , value = TRUE ) )
evanmiller commented 6 years ago

Fixed in https://github.com/WizardMac/ReadStat/commit/073132322a581942b7df8f1613d4d3070e1c4929

hadley commented 6 years ago

Those files now all open successfully for me - thanks for the bug report!

jflournoy commented 6 years ago

I am still having an issue with 1.1.1.9. I think my file might be quite a bit bigger than the examples: 231.7 MB. Unfortunately, I can't distribute it. Reverting to 1.1.0 is fine.

EDIT: I did look for a bigger public sav file, but the 30.9 MB one from GEM (http://www.gemconsortium.org/data/sets?id=aps) worked fine in both versions.

Error:

> packageVersion('haven')
[1] ‘1.1.1.9000’
> lntItemDF <- haven::read_sav('/data/jflournoy/lnt_pxvx/LT_wideAGT1234.sav')
Error in df_parse_sav_file(spec, encoding, user_na) : 
  Failed to parse /data/jflournoy/lnt_pxvx/LT_wideAGT1234.sav: Unable to allocate memory.

Okay:

> packageVersion('haven')
[1] ‘1.1.0’
> lntItemDF <- haven::read_sav('/data/jflournoy/lnt_pxvx/LT_wideAGT1234.sav')
antaldaniel commented 6 years ago

The issue persist for me, too. I tried downgrading to 1.1.0, but in that case I receive another error:

devtools::install_version("haven", version = "1.1.0") spss <- haven::read_spss("data-raw/[filename].sav") Error in .Call("haven_df_parse_sav_file", PACKAGE = "haven", spec, user_na) : "haven_df_parse_sav_file" not available for .Call() for package "haven"

And with 1.1.1

Error in df_parse_sav_file(spec, user_na) : Failed to parse C:/ [...]_package/surveyreader/data-raw/[filename].sav: Unable to allocate memory. The file is large, because it is a SurveyMonkey dump, not very efficient in structuring (1483 columns)

ajdamico commented 6 years ago

hi @jflournoy and @antaldaniel they might need a public file :/ filesize alone isn't the culprit, these huge .sav files all import without issue

timss_spss <- xml2::read_html( "https://timssandpirls.bc.edu/timss2015/international-database/" )

spss_links <- grep( "SPSSData" , rvest::html_attr( rvest::html_nodes(timss_spss,'a') , 'href' )  ,value = TRUE )

big_zips <- 
    c( 
        paste0( "https://timssandpirls.bc.edu/timss2015/international-database/" , spss_links ) ,
        "http://vs-web-fs-1.oecd.org/pisa/PUF_SPSS_COMBINED_CMB_STU_COG.zip",
        "http://vs-web-fs-1.oecd.org/pisa/PUF_SPSS_COMBINED_CMB_STU_QQQ.zip",
        "http://vs-web-fs-1.oecd.org/pisa/PUF_SPSS_COMBINED_CMB_STU_COG.zip" 
    )

tf <- tempfile()

for( this_zip in big_zips ){
    download.file(this_zip,tf,mode='wb')
    z <- unzip( tf , exdir = tempdir() )
    for( this_sav in grep( '\\.sav' , z , value = TRUE , ignore.case = TRUE ) ) x <- haven::read_spss( this_sav )
}
darkdoudou commented 6 years ago

I have exactly the same problem that @antaldaniel Thus, it's now worth than ever as I can't read data neither with haven version 1.1.1 nor 1.1.0. Does anyone found the solution? Thank you for your help

ajdamico commented 6 years ago

@darkdoudou they probably need you to provide an example file that triggers the error..

darkdoudou commented 6 years ago

Unfortunately, I'm not able to share the original data (which is indeed large), and I don't have SPSS, so I'm not able to provide a minimal working example!

darkdoudou commented 6 years ago

OK, so, after having removed and reinstalled Haven 1.1.0 3 times (!?) and exit R (probably the most important), it now works perfectly.

dl7631 commented 6 years ago

Having exactly the same issue with read_spss. on an RStudio server.

bednarowska commented 6 years ago

Hi, I encountered the similar problem (I have a 1.74 GB SPSS file which I cannot share either, I cannot delete any variables or cases either). I tried to install dev version of haven, from website @dicorynia shared: https://mran.microsoft.com/snapshot/2017-11-01/bin/windows/contrib/3.5/

It doesn't work tho, as I got message error: Error: package or namespace load failed for ‘haven’: package ‘haven’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

batpigandme commented 6 years ago

@bednarowska

Error: package or namespace load failed for ‘haven’: package ‘haven’ was installed by an R version with different internals; it needs to be reinstalled for use with this R version

This is unrelated. You are either running a version of R < 3.5 and trying to install a version for R > 3.5 or vice versa. Because it's a major version change, with different internals, 3.5 requires that you reinstall all packages. See more: http://blog.revolutionanalytics.com/2018/04/r-350.html

bednarowska commented 6 years ago

Thanks for coming back. Well, it is related as I tried the same solution as other people mentioned in this thread, and wasn't working for me. I realize there are tremendous changes in R version, but my questions is: Did others who tried to reinstall a dev version of haven had to reinstall all packages as well? I am working on 3.5.1.

ghost commented 5 years ago

Hi @batpigandme Is there an email address (or any other method, such as ftp) I can submit an SPSS file with confidence? I'm also having a similar issue with v2.0 and I was wondering what feature in that data file is causing the error.

batpigandme commented 5 years ago

Since this issue is closed, you should probably open a new one. From the readxl repo, at least a couple of these should work for SPSS files. The last one is least preferable, since it means that the issue is only reproducible for the individual with the file.

How to provide your own xls/xlsx file? In order of preference:

  1. Attach the file directly to your issue. Instructions are always at the bottom of the issue or comment box. .xlsx is a supported file type. You'll need to zip or gzip .xls so it appears as .zip or .gz.
  2. Share via DropBox or Google Drive and provide the link in your issue.
  3. Explain you absolutely cannot provide a relevant file via github.com and offer to provide privately.
lock[bot] commented 5 years ago

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/