tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

Some value labels not being loaded from catalog file #716

Open joshuaborn opened 1 year ago

joshuaborn commented 1 year ago

Congratulations on the latest release! I tried out haven 2.5.2 on several SAS data sets with catalog files and no longer have been getting any parse errors.

I have noticed that after read_sas calls, some tibbles get their labelled() style labels and some don't, and even within the same tibbles, sometimes some columns will be labelled and some won't.

I looked a little through the haven source code and while I'm not familiar with C++ and it's been a while since I've done anything with C, it does look like there are error messages sprinkled throughout the code that loads and applies the SAS formats. Would it be possible to expose this error messaging in a verbose-mode or debug-mode, perhaps in a log file, so haven users can see what's causing the labeling not to be applied?

gorcha commented 1 year ago

Hey @joshuaborn,

Thanks, and great to hear! 🙂

Do you mean the error code assignments using READSTAT_ERROR_* that are peppered throughout the code? If so, these are fatal errors that stop the parser running, and are currently reported to the user as the failure message from read_sas() (this is where the Error: Failed to parse catalog.sas7bcat: Invalid file, or file has unsupported features. comes from).

We've seen at least one previous issue with labels selectively being missed (see #529, which was fixed in the recent release). If you have a similar problem can you please chuck in an issue with an example catalog file, and a matching data file if possible?

joshuaborn commented 1 year ago

Oh, I see, I did mean the READSTAT_ERROR_*s. I'll change the title of this issue.

Attached is a zip archive with two data sets, both with sas7bcat data and sas7bcat catalog file pairs. value-labels-from-catalog-files.zip

Both data sets should have value labels for variables OUTCOME and PREGORDR. The "labelled" data set successfully loads the value labels for PREGORDR for me, but does not load the value labels for OUTCOME. The "no-labels" data set does not seem to have any value labels.

gorcha commented 1 year ago

Perfect, thanks!

dusadrian commented 1 year ago

This also happened to me, it is definitely a bug. In case it helps, I attach a .zip file myself, with both the .sas7bdat file and the associated .sas7bcat catalog: sascat.zip

Thanks in advance for fixing this.