tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

`read_xpt` fails on file from US government agency #709

Closed ajdamico closed 1 year ago

ajdamico commented 1 year ago

hi, not exactly sure why this file won't parse (foreign::read.xport also fails) but this file comes from the US government so assume it might be a somewhat common issue? appreciate it!


    tf <- tempfile()

    this_url <- "https://www.cms.gov/files/zip/cspuf2019.zip"

    download.file( this_url , tf , mode = 'wb' )

    unzipped_files <- unzip( tf , exdir = tempdir() )

    my_xpt <- grep( 'xpt$' , unzipped_files , value = TRUE )

    my_tbl <- haven::read_xpt( my_xpt )
    # Error: Failed to parse C:/Users/ANTHONYD/AppData/Local/Temp/Rtmp6Xxp4L/cspuf2019.xpt: Invalid file, or file has unsupported features.
gorcha commented 1 year ago

Hi @ajdamico,

It looks like this file is actually a SAS CPORT file - we've had previous reports of xpt files that are actually in the CPORT format, for e.g. #453. CPORT files are in a totally different format to XPORT files, and unlike regular transport files it is a closed and undocumented format that's intended usage is transferring data between compatible versions of SAS (see the Library of Congress notes for a bit more detail). CPORT files are not currently supported by ReadStat.

There's a feature request open at WizardMac/ReadStat#187 for CPORT support, but since it is a closed format adding support would be quite a lot of work so this is unlikely to happen any time soon.