tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

bug while reading sas7bdat file #728

Open vpprasanth opened 1 year ago

vpprasanth commented 1 year ago

There is a bug while importing sas7bdat files into the R environment. That is, if the sas7bdat file contains a date variable and is holding a value 07/07/7777 (say. By the way, this is a valid data for clinical studies to characterize "not applicable". similarly, we make use of 09/09/9999 to refer it as a missing value). Then if we import the same into R, it reads it as 7777-07-06. I could understand the change in format. However, I am a bit baffled with the change in value over here.

It would be nice if we can have an option to read all the variables as characters or "as it is", than changing the class by default.

gorcha commented 1 year ago

Hi @vpprasanth, thanks for the bug report.

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

Date variables in SAS are stored as numeric values so we can't preserve "as is" unfortunately, there's a necessary conversion step from the numeric value (number of days or seconds since the origin date) to the date representation. It looks to me like it's due to SAS having a difference in a leap day somewhere, but this will be easier to track down with a reprex.

Thanks!

vpprasanth commented 1 year ago

bug.zip

Please find attached the zip file that contains the following: a. txt_data b. sas_data

Here is the SAS code used for generating the sas7bdat file (sas_data)

data sas_data;
informat Sub_ID 5. Date ddmmyy10. BMI 5.;
format Date ddmmyy10.;
infile "/home/u63128400/txt_data.txt" missover;
input Sub_ID Date ddmmyy10. BMI ;
run;

libname out "/home/u63128400/";
data out.sas_data;
set sas_data;
run;

Now, if you open the sas_data (the sas7bdat file) in SAS, you could see the date values as 07/07/7777. However, if you open the same sas_data (the sas7bdat file) in R using haven, you will see 7777-07-06. This is a mismatch.

gorcha commented 1 year ago

Thanks!

vpprasanth commented 1 year ago

By the way, SAS goes wrong with the leap year and it seems that it's an existing problem...!!! https://blogs.sas.com/content/sasdummy/2010/04/05/in-the-year-9999/