tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

haven::read_sas unable to allocate memory for 16MB SAS page sizes #697

Closed inpowell closed 1 year ago

inpowell commented 1 year ago

haven::read_sas cannot read a SAS data file with page size 16 MiB (16777216 bytes). Some data files with sizes slightly under 16 MiB also fail to read.

I would expect the attached sas7bdat files (which I have zipped to keep filesize under 10MB) to be read in by haven::read_sas and give a 10,000 row tibble with one column empty consisting only of empty strings.

20221123 - haven bug report.zip

haven::read_sas('test_16766976.sas7bdat') # Succeeds
# # A tibble: 10,000 x 1
# empty
# <chr>
#   1 ""   
# 2 ""   
# 3 ""   
# 4 ""   
# 5 ""   
# 6 ""   
# 7 ""   
# 8 ""   
# 9 ""   
# 10 ""   
# # ... with 9,990 more rows

haven::read_sas('test_16776192.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding,  : 
#                              Failed to parse <snip>/test_16776192.sas7bdat: Unable to allocate memory.

haven::read_sas('test_16777216.sas7bdat') # Fails
# Error in df_parse_sas_file(spec_data, spec_cat, encoding = encoding, catalog_encoding = catalog_encoding,  : 
#                              Failed to parse <snip>/test_16777216.sas7bdat: Unable to allocate memory.

I generated these files in SAS using

* libname out "appropriate/path/here";

data out.test_16777216 (bufsize=16777216 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
* PROC CONTENTS to verify page size;
proc contents data=out.test_16777216 varnum; run;

data out.test_16776192 (bufsize=16776192 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16776192 varnum; run;

data out.test_16766976 (bufsize=16766976 compress=no);
length empty $3000;
do i = 1 to 10000; output; end;
drop i;
run;
proc contents data=out.test_16766976 varnum; run;

Workaround: Set the default page size in SAS to 8MB with -BUFSIZE 8M or on a case-by-case basis. The default page size for my operating environment is 16M.

gorcha commented 1 year ago

Hi @inpowell, thanks for the bug report.

There's a hard limit to SAS page size in ReadStat, the underlying C library, to avoid memory allocation issues with malformed SAS input (see WizardMac/ReadStat#249 for details).

I've opened an issue over at ReadStat to see if we can get the maximum size increased.