tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

`catalog_file` ignored in `read_sas()` when file created on Unix system #696

Closed maike2011 closed 1 year ago

maike2011 commented 2 years ago

Having to work with SAS files created a unix system (SAS 9.4), we observed the following using read_sas() with a catalog_file (haven 2.5.1, R 4.2.1):

The catalog file (sas7bcat) seems to be ignored (no message, no error) if created on Unix, while read_sas() works as expected for catalog files created on windows, irrespective of the system that the corresponding data file was created on.

The attached zip contains a reproducible example with recreations of haven's example sas data sets hadley.sas7bdat and formats.sas7bcat in different variations: both were recreated twice using either

All data sets have wlatin1 encoding.

Both data files can be read with the Windows catalog file with formats being applied as expected, but no formats are available when using the Unix catalog file.

gorcha commented 1 year ago

Hi @maike2011, thanks for the bug report and for the example files!

This is likely an issue in ReadStat, the underlying C library. SAS don't publish any info on their file formats so all open source SAS readers rely on reverse engineering to support the various different file structures.

I'll have a look and see if there's an obvious cause, but it might take a while for any necessary changes to be made to ReadStat and flow downstream to haven.