tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

read_sas variable labels mismatch #601

Open ajdamico opened 3 years ago

ajdamico commented 3 years ago

when i load this dataset in SAS, this variable label contains the correct text.. loading the same dataset with haven gives a mis-aligned label? red arrows show the difference.. thanks!

screenshot
tf <- tempfile()
download.file( "https://www2.census.gov/programs-surveys/sipp/data/datasets/2018/pu2018_sasdata.zip" , tf , mode = 'wb' )
z <- unzip( tf , exdir = tempdir() )
w <- haven::read_sas( z )
str(w[,'TMED_AMT'])
hadley commented 3 years ago

That doesn't download for me. Could you provide a smaller example?

ajdamico commented 3 years ago

thanks for looking at this!

it downloads from the us census bureau fine for me..

> tf <- tempfile()
> download.file( "https://www2.census.gov/programs-surveys/sipp/data/datasets/2018/pu2018_sasdata.zip" , tf , mode = 'wb' )
trying URL 'https://www2.census.gov/programs-surveys/sipp/data/datasets/2018/pu2018_sasdata.zip'
Content type 'application/zip' length 418345769 bytes (399.0 MB)
downloaded 399.0 MB

> z <- unzip( tf , exdir = tempdir() )
> w <- haven::read_sas( z )
> str(w[,'TMED_AMT'])
tibble [763,186 x 1] (S3: tbl_df/tbl/data.frame)
 $ TMED_AMT: num [1:763186] NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "label")= chr " income was receivedWhether overtime income was receivedWhether overtime income was receiv"

..but when i attempt to make it smaller..

# SAS code to simply copy the source file into a new dataset
data x.out;
   set x.pu2018;
run;

..the problem unfortunately goes away:

# R code reading in the copied dataset which no longer has the incorrect label
> w <- haven::read_sas( "out.sas7bdat" )
> str( w[ , 'TMED_AMT' ] )
tibble [763,186 x 1] (S3: tbl_df/tbl/data.frame)
 $ TMED_AMT: num [1:763186] NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "label")= chr "Whether any money was owed for medical bills not paid in full during the reference period."

not sure if this issue is worth debugging further or if we ought to just close it :-/

thanks very much for your time!