tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

Error in read_sas using catalog file #680

Closed ValValetl closed 1 year ago

ValValetl commented 2 years ago

Hi, I am getting the error message "Error: Failed to parse formats.sas7bcat: Invalid file, or file has unsupported features. " when importing SAS data with a catalog file. This is the same error as in the closed issue #34. The data import without catalog file works.

I am using the latest haven version (2.5.0) and tested it with the development version on github.

fpath <- "path/to/sas/data/file"
catalog  <- "formats.sas7bcat"
sas_data <- haven::read_sas(fpath, catalog_file = catalog)

Error: Failed to parse catlog.sas7bcat: Invalid file, or file has unsupported features. 
gorcha commented 2 years ago

Hi @ValValetl, thanks for the bug report.

Can you please share the catalog file and also some example data if possible? Without the catalog file it's not possible to track down the error.

ValValetl commented 2 years ago

Hi @gorcha Unfortunately, this is not possible at it is non-public data. I thought the issue report might still be of interested as issue #34 was closed a while ago, without any resolution of the issue.

gorcha commented 2 years ago

Even if not the data, are you able to share the catalog file?

ValValetl commented 2 years ago

I need to check with the owner. I will get back to you later. Thanks for your quick responses!

ValValetl commented 2 years ago

Sorry for the long delay. Here is the catalog file that produces the error message: sas_catalog_file.zip

gorcha commented 2 years ago

No worries at all, thanks!

joshuaborn commented 2 years ago

Was this ever diagnosed? I'm running into the same issue.

gorcha commented 2 years ago

Hi @joshuaborn, I haven't had a chance to look at this yet unfortunately but hopefully will over the next few weeks.

There's no guarantee that this is the same issue affecting you. Would you be able to provide an example file that I can test by any chance?

joshuaborn commented 2 years ago

Hi, @gorcha . The particular file I first encountered the issue with was a restricted use file, but I've seen it with at least one other data set since then. I should have some time this weekend to try it out with public use data files, and if I can replicate it, I'll share.

gorcha commented 2 years ago

Thanks @joshuaborn, much appreciated!

joshuaborn commented 1 year ago

NSFG_example.zip

I neglected to follow-up on this back in September, but I was using Haven today and found a good example of this issue with public use data. Attached are four files from the National Survey of Family Growth 2017-2019 public use data. The d2017_2019femresp.sas7bdat and d2017_2019femresp.sas7bcat pair load using read_sas just fine, but trying to use read_sas with the d2017_2019fempreg.sas7bdat and d2017_2019fempreg.sas7bcat pair leads to an error message of the form

Error: Failed to parse .../d2017_2019fempreg.sas7bcat: Invalid file, or file has unsupported features.

Using read_sas on just d2017_2019fempreg.sas7bdat without the catalog file works.

I'm using R version 4.2.2 on Windows 11 with Haven version 2.5.1.

The interesting thing about this example is that the pregnancy data table (d2017_2019fempreg) is ultimately derived from the female respondents table (d2017_2019femresp). I tried examining the two catalog files in SAS using PROC CATALOG, but didn't see anything obvious in one, but not the other.

As an aside, since these parse errors seem to happen with catalog files more than with regular SAS data files, maybe it would be worth adding to Haven the ability to side-load value labels from a sas7bdat file or even a CSV file? It seems pretty straightforward to load another table and call labelled as needed, and SAS can export its value labels to a regular data table easily with PROC CONTENTS, etc. I would be willing to work on this, since it would save me time in the long run.

gorcha commented 1 year ago

Hi @joshuaborn, thanks for the extra example file - there have been a few recent updates in the dev version of ReadStat for catalog file reading that might resolve these issues, I'll check it out.

I suspect this is a little different to the initial problem in this issue (which was specifically a problem with Unix 64 bit file formats), but there are some other bugs that have been fixed that might affect this one.

gorcha commented 1 year ago

Hi @joshuaborn, can confirm that the recent ReadStat changes have fixed the issue with this file. They've just released an update over there so these should be in haven shortly!

joshuaborn commented 1 year ago

Hi, @gorcha. Thanks for confirming that! And my apologies for resurrecting the wrong issue thread.

gorcha commented 1 year ago

No worries at all!