tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

The exported file cannot be opened by SAS #224

Closed Jiang-Li-backup closed 1 year ago

Jiang-Li-backup commented 7 years ago

I exported data to a SAS file. This file can be read by haven, but not the SAS. I am using SAS EG 7.1.

hadley commented 7 years ago

Please provide a reproducible example.

Jiang-Li-backup commented 7 years ago

Thx. I also compared with the same data set created by SAS. The file generated by SAS EG is twice as big as that generated by R Haven. May I send the data files and code to you by email?

hadley commented 7 years ago

Sure - both haven and SAS versions of the same file would be super helpful.

Jiang-Li-backup commented 7 years ago

I emailed the data and code to you a few days ago. Please let me know if you have not received it. Thx.

rogerjdeangelis commented 7 years ago

r_write_sas7bdat.zip

rogerjdeangelis commented 7 years ago

Observation

Byte 33 has value 33 which indicates that the R SAS datasets use the layout of unix 64 bit.

I will try to read the R SAS dataset on Unix 64.

This should not matter, Win SAS should be able to read the dataset. I will look at the rest of the layout. Unfortunately, I don't have WPS on Unix so I will create a unix SAS SAS dataset(not WPS) so more of the R layout can be checked. SAS datasets seemed to have more filler which complicates the comparison.

At least the magic string matches

0-32 bytes Magic string 000000000000000000000000C2EA8160B31411CFBD92080009C7318C181F1011 WPS 000000000000000000000000C2EA8160B31411CFBD92080009C7318C181F1011 R

33-64 bytes 2222003333010232040000000000000000000301181F10112222003333010232 WPS 3300002200010031000000000000000000000000000000000000000000000000 R

evanmiller commented 7 years ago

A quick note about the "filler". SAS files are organized into a "header" followed by 1 or more "pages". By default haven will write a header of size 1024 bytes and pages of size 4096 bytes. This layout was common with older versions of SAS, but newer versions seem to use much larger header and page sizes. The sizes are defined as 32-bit integers at (or near) byte offsets 200 and 204. In any event, extra-large page sizes are the source of filler in the SAS files. If you skip to the beginning of each page it should be easier to analyze the file.

For binary debugging I use hecate.

rogerjdeangelis commented 7 years ago

Thanks for the info and the excellent work.

None of Matt's test datasets have byte 32=33 and byte 35=22. I have checked many other datasets created with different hardware and none of them have the 33/22 combination that write_sas creates

Test datasets for Matt SAS7BDAT

                      32 33 34 35 36 37

compress_no.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3NET_ASRV........ compress_yes.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3NET_ASRV........ lowbwt_i386.sas7bdat 22 22 00 32 22 01 02 32 9.0202M0W32_VSPRO....... missing_values.sas7bdat 22 22 00 32 22 01 02 32 9.0202M0W32_VSPRO....... obs_all_perf_1.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3XP_PRO.......... adsl.sas7bdat 22 22 00 33 33 01 02 32 ....9.0202M3X64_ESRV.... eyecarex.sas7bdat 22 22 00 33 22 00 02 31 ....9.0000M0WIN......... lowbwt_x64.sas7bdat 22 22 00 33 33 01 02 32 ....9.0202M2X64_VSPRO... natlterr1994.sas7bdat 33 22 00 33 33 00 02 31 ........9.0101M3SunOS... natlterr2006.sas7bdat 33 22 00 33 33 00 02 31 ........9.0101M3SunOS... txzips.sas7bdat 33 22 33 22 00 33 33 01 02 31 ........9.0201M0Linux...

evanmiller commented 7 years ago

@rogerjdeangelis Nice detective work!

realitix commented 7 years ago

Hello @hadley, do you have some news about this issue ?

mnarasimhan02 commented 7 years ago

I have the same issue. I can create a data frame from csv and write it to sas7bdat, however I can't open them in SAS viewer and results in error. I can see the file size has increased(i.e data is written within the file)

TEST.zip

Steps to reproduce:

Convert attached test.CSV to data frame

test <- read.csv("test.csv")

test.csv contain 5 obs

library("haven"); write_sas(test, "one.sas7bdat");

Kenkleinman commented 7 years ago

Is there a solution to this issue yet?

rambler commented 6 years ago

It's really misleading that the readme for this project indicates that haven is capable of writing SAS data files when this and several related bugs indicate that it can't.

normark commented 6 years ago

@evanmiller Sorry for kicking up yet another old issue, but we are running into this problem as well. Data written using write_sas cannot be loaded by SAS (9.4), but works fine when read using read_sas from inside R.

The following is enough to reproduce it on Windows Server 2012R2 using the latest (master) version of haven and SAS 9.4. The exported file dataset.sas7bdat fails to load with an error that the dataset is not a SAS dataset.

library(haven)
dat <- as.data.frame(matrix(1, 10, 10))
write_sas(dat, 'dataset.sas7bdat')
adam-garcia commented 6 years ago

Though not a direct solution, if an R > SAS workflow is necessary for your work it looks like haven::write_sav() could be a temporary workaround. SAS can then read in the .sav file and variable labels, formats, etc persist (whereas they wouldn't with readr::write_csv(), etc).

So, in R:

library(tidyverse)
library(haven)
# View first 10 obs
diamonds(head, 10)

#> # A tibble: 10 x 10
#>    carat       cut color clarity depth table price     x     y     z
#>    <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#>  1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
#>  2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
#>  3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
#>  4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
#>  5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
#>  6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
#>  7  0.24 Very Good     I    VVS1  62.3    57   336  3.95  3.98  2.47
#>  8  0.26 Very Good     H     SI1  61.9    55   337  4.07  4.11  2.53
#>  9  0.22      Fair     E     VS2  65.1    61   337  3.87  3.78  2.49
#> 10  0.23 Very Good     H     VS1  59.4    61   338  4.00  4.05  2.39

#  Export to sav
diamonds %>%
  head(10) %>%
  write_sav( "<path-to-file>")

Then, in SAS:

/* Import from sav */
proc import out = diamonds
            datafile = "<path-to-file>"
            dbms = SAV replace;
run;
***
NOTE: The import data set has 10 observations and 10 variables.
NOTE: WORK.DIAMONDS data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds
**;

/* View the data */
ods listing;
proc print data = diamonds;
run;

***
                                       The SAS System

 Obs    carat cut       color clarity    depth    table    price        x        y        z

   1     0.23 Ideal       E    SI2       61.50    55.00      326     3.95     3.98     2.43
   2     0.21 Premium     E    SI1       59.80    61.00      326     3.89     3.84     2.31
   3     0.23 Good        E    VS1       56.90    65.00      327     4.05     4.07     2.31
   4     0.29 Premium     I    VS2       62.40    58.00      334     4.20     4.23     2.63
   5     0.31 Good        J    SI2       63.30    58.00      335     4.34     4.35     2.75
   6     0.24 Very Good   J    VVS2      62.80    57.00      336     3.94     3.96     2.48
   7     0.24 Very Good   I    VVS1      62.30    57.00      336     3.95     3.98     2.47
   8     0.26 Very Good   H    SI1       61.90    55.00      337     4.07     4.11     2.53
   9     0.22 Fair        E    VS2       65.10    61.00      337     3.87     3.78     2.49
  10     0.23 Very Good   H    VS1       59.40    61.00      338     4.00     4.05     2.39

**;
hadley commented 6 years ago

Can you please try again with the development version of haven?

rogerjdeangelis commented 6 years ago

The free version of WPS, WPS Express, can create a SAS dataset of unlimited size with 'Proc R. WPS interfaces with R in the same way SAS/IML interfaces with R.

It looks like there are no 'non-commercial' limitations with express.

download from https://www.worldprogramming.com/us/try-or-buy/wps/editions/express

You can use haven to import large SAS datasets and WPS/Proc R to create a unlimited output SAS dataset. Not sure but the same may be true for 'proc python(WPS van create a SAS dataset from Python.

Otherwise I like to use mySQL or MS Access to move data from/to SAS.

On Thu, Feb 15, 2018 at 12:18 PM, Hadley Wickham notifications@github.com wrote:

Can you please try again with the development version of haven?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-365998294, or mute the thread https://github.com/notifications/unsubscribe-auth/AMc0hW-Eb6API9VZ2QvfXm-w1awg3kh4ks5tVGbrgaJpZM4KRl8G .

rogerjdeangelis commented 6 years ago

There is some information about interfaces between WPS/SAS and pyhon/R on my github

https://github.com/rogerjdeangelis

On Thu, Feb 15, 2018 at 12:34 PM, Roger DeAngelis <rogerjdeangelis@gmail.com

wrote:

The free version of WPS, WPS Express, can create a SAS dataset of unlimited size with 'Proc R. WPS interfaces with R in the same way SAS/IML interfaces with R.

It looks like there are no 'non-commercial' limitations with express.

download from https://www.worldprogramming.com/us/try-or-buy/wps/editions/express

You can use haven to import large SAS datasets and WPS/Proc R to create a unlimited output SAS dataset. Not sure but the same may be true for 'proc python(WPS van create a SAS dataset from Python.

Otherwise I like to use mySQL or MS Access to move data from/to SAS.

On Thu, Feb 15, 2018 at 12:18 PM, Hadley Wickham <notifications@github.com

wrote:

Can you please try again with the development version of haven?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-365998294, or mute the thread https://github.com/notifications/unsubscribe-auth/AMc0hW-Eb6API9VZ2QvfXm-w1awg3kh4ks5tVGbrgaJpZM4KRl8G .

Jiang-Li-backup commented 6 years ago

Just installed the dev version. The exported sas file still can not be opened by SAS Enterprise Guide 7.1. image

bellafeng commented 6 years ago

same issue as @Jiang-Li An observation: library(haven) class<-read_sas("class.sas7bdat")
write_sas(class, "class_fromR.sas7bdat") class_fromr <-read_sas("class_fromR.sas7bdat")
The first read_sas gives back a tibble "class", but the second one class_fromr is a data frame.

identical(class, class_fromr) [1] FALSE

xiaodaigh commented 6 years ago

same issue here

hadley commented 6 years ago

Ok, we'll keep trying. Thanks for all the feedback!

ningjingzhiyuan507 commented 6 years ago

@hadley What is the latest development on this issue? I have the same problem!

al-obrien commented 5 years ago

I can confirm that I am also having issues, as described here, on the latest version of the CRAN package and R/Rstudio.

JoviaNierenberg commented 5 years ago

Same issue.

caimiao0714 commented 5 years ago

Same issue.

I still have the same problem here for haven package.

carloswpla commented 5 years ago

Same issue. Here's all the debug information I could think to collect.

SAS / Linux version: NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.4 (TS1M2) NOTE: This session is executing on the Linux 3.0.101-108.21-default (LIN X64) platform. NOTE: Updated analytical products:

      SAS/STAT 13.2
      SAS/ETS 13.2
      SAS/OR 13.2
      SAS/IML 13.2
      SAS/QC 13.2

R/Linux Versions:

R.Version() $platform [1] "x86_64-conda_cos6-linux-gnu"

    $arch
    [1] "x86_64"

    $os
    [1] "linux-gnu"

    $system
    [1] "x86_64, linux-gnu"

    $status
    [1] ""

    $major
    [1] "3"

    $minor
    [1] "4.3"

    $year
    [1] "2017"

    $month
    [1] "11"

    $day
    [1] "30"

    $`svn rev`
    [1] "73796"

    $language
    [1] "R"

    $version.string
    [1] "R version 3.4.3 (2017-11-30)"

    $nickname
    [1] "Kite-Eating Tree"

Package Versions: haven version 1.1.0

file size comparison mtcars exported form haven vs. mtcars created natively in SAS (from csv exported from R): -rw-r--r-- 1 9216 Nov 9 11:35 haven_mtcars.sas7bdat -rw-r--r-- 1 1236 Nov 9 11:21 mtcars.txt -rw-r--r-- 1 131072 Nov 9 11:34 sas_mtcars.sas7bdat

example.tar.gz

oljees commented 5 years ago

Hi, Is there any update on this anywhere - I'm having the same issue. write_sas() files cannot be opened in SAS (9.4)

mlaviolet commented 5 years ago

Still an issue with haven 2.0.0. Perhaps the issue could be noted in the documentation until a fix is found. By the way, SAS has never published the technical specs for .sas7bdat files, which are Windows-specific. The SAS transport format (.xpt extension) is platform independent and openly documented. These files can be read and written with the foreign and SASxport packages.

anuj2054 commented 5 years ago

Same issue. Easiest way to replicate it

step 1

library(haven) write_sas(mtcars,"mtcars.sas7bdat")

step 2

open mtcars.sas7bdat in SAS and it says not compatible etc...

reikoch commented 5 years ago

For all what it is worth - in https://github.com/reikoch/testfiles there is now a small dummy SAS v9 dataset with genuine v5 and v8 transport file formats next to it.

datafj commented 4 years ago

Here is my way to export r data to SAS. It uses PROC IML in SAS to run the R code. So yes, it requires SAS and IML. But it should be able to import any R data.frame without problem.

Here is the SAS code:

/*
Import R data to SAS
rfile: r file name
dsn: SAS dataset name
*/
libname test "sas_data_folder";
%LET rfile = my_folder/my_filename.fst;
%LET dsn = test.sas_data_name;
PROC IML;
    rfile = "&rfile";
    SUBMIT rfile / R;
        df = fst::read.fst("&rfile") 
    ENDSUBMIT;
    RUN ImportDataSetFromR("&dsn","df");
QUIT;

I save r data into .fst, but you can use any format you want, just change the R code df = fst::read.fst("&rfile") accordingly. Keep the double quotes.

You don't have to leave R to run the above code. You can generate the SAS code file in R, and then use system to invoke SAS command line to run the SAS program.

You will also need to make sure your SAS can run R code by adding the following to "C:\Program Files\SASHome2\SASFoundation\9.4\nls\en\sasv9.cfg"

-RLANG
-SET R_HOME "C:\Program Files\R\R-3.6.1"

Here is my R code to implement the whole process :

# fst_file: full path of the fst file name
# sas_folder: path of the sas data folder
# sas_dsn: sas data set name
# sas_code_file: full path of the sas code file
ExportToSAS = function(fst_file, sas_folder, sas_dsn, sas_code_file)
{
    # create sas code file
    sas_code = 'LIBNAME mylib "%SAS_FOLDER%";
                PROC IML;
                    SUBMIT fst_file / R;
                        df = fst::read.fst("%FST_FILE%") 
                    ENDSUBMIT;
                    RUN ImportDataSetFromR("mylib.%SAS_DSN%","df");
                QUIT;'
    sas_code = gsub("%FST_FILE%", fst_file, sas_code, fixed = TRUE)
    sas_code = gsub("%SAS_FOLDER%", sas_folder, sas_code, fixed = TRUE)
    sas_code = gsub("%SAS_DSN%", sas_dsn, sas_code, fixed = TRUE)
    writeLines(sas_code, sas_code_file)

    # run the sas code file
    sas_log_file = gsub(".sas", ".log", sas_code_file, fixed = TRUE)
    sas_lst_file = gsub(".sas", ".lst", sas_code_file, fixed = TRUE)
    sas_command = '"c:/Program Files/SASHome2/SASFoundation/9.4/sas.exe" -CONFIG "c:/Program Files/SASHome2/SASFoundation/9.4/sasv9.cfg" -NOSPLASH -SYSIN "%SAS_CODE_FILE%" -LOG "%SAS_LOG_FILE%" -PRINT "%SAS_LST_FILE%"'
    sas_command = gsub("%SAS_CODE_FILE%", sas_code_file, sas_command, fixed = TRUE)
    sas_command = gsub("%SAS_LOG_FILE%", sas_log_file, sas_command, fixed = TRUE)
    sas_command = gsub("%SAS_LST_FILE%", sas_lst_file, sas_command, fixed = TRUE)
    system(sas_command, invisible = FALSE)
}
mlaviolet commented 4 years ago

As Hadley pointed out earlier, SAS is quite litigious and I don't see much hope for a solution. Third party applications like Stat/Transfer can handle the SAS proprietary format, but they've probably paid a good deal of money for the privilege. The best solution I see is to save in SAS transport format with haven::write_xpt() and then import into SAS. Keep in mind variable names are limited to 32 characters, Ability to import SAS datasets is much more important anyway.

rogerjdeangelis commented 4 years ago

The free full version of WPS/CE(6 month license) is available for download.

WPS can read and write sas7bdats.

Also WPS can convert a SAS dataset to an R or python dataframe and convert a T amd Python dataframes to SAS.

My github site has examples

The problem is NOT reading SAS datasets, the problem is creating SAS datasets

SAS Forumhttps://tinyurl.com/y82q6vdlhttps://communities.sas.com/t5/SAS-Procedures/Importing-Excel-files-in-syntax-without-SAS-ACCESS/m-p/492763

StackOverflowhttps://stackoverflow.com/questions/49235845/export-pandas-dataframe-to-sas-sas7bdat-format

github WPS Proc Pythonhttps://github.com/rogerjdeangelis/utl_wps_python_read_write_sas_tables

seehttps://goo.gl/uqrg8ahttps://communities.sas.com/t5/SAS-in-Health-Care-Related/Read-SAS-file-without-any-SAS-installation-in-net-4-6/m-p/389054

Read SAS7BDAT

  1. R       (open source)
  2. Python  (open source)
  3. WPS (free express edition - has nor limit on the size od SAS

datasets created by R)

  1. StatTransfer (SAS owns)

  2. DBMS Copy

  3. Carolina (Dulles software)

  4. SPSS

  5. DSREAD (can creates CSV files)

  6. CozyRoc

    Write SAS7BDAT

  7. R (open source - under development -fails )

  8. StatTransfer ($350 Business Perpetual)

  9. WPS (free full early release edition - no limit on the size od SAS datasets created by R or Python? )

  10. DBMS Copy (SAS bought it out and shut it down?, Several universites are granfathered in and studenst can get it for $25?) The grandfater issue is interesting. Can you transfer the software to someone else after you graduate. However I believe some of the licenses stipulate that you can only use it ON CAMPUS?(weird)

  11. Carolina (uses JDBC?) $495

  12. SPSS ($100 per month for desktop)

  13. CozyRoc ($4,000 server license perpetual)

Work stopped in R to create SAS datasets (for now?). https://github.com/tidyverse/haven

R

  1. read_sas() reads .sas7bdat + .sas7bcat files
  2. read_xpt() reads SAS transport files (version 5 and version 8).
  3. write_sas() writes .sas7bdat(fails) files.

Importing and Exporting V5 and V8 transport files

  1. Allows long variable names but char vars are limited to 200 bytes. Does not handle formats correctly. Easy fix for longer variables.

  2. With my changes allows long variable names but char vars are limited to 200 bytes.

On Tue, Nov 12, 2019 at 3:36 PM Michael Laviolette notifications@github.com wrote:

As Hadley pointed out earlier, SAS is quite litigious and I don't see much hope for a solution. Third party applications like Stat/Transfer can handle the SAS proprietary format, but they've probably paid a good deal of money for the privilege. The best solution I see is to save in SAS transport format with haven::write_xpt() and then import into SAS. Keep in mind variable names are limited to 32 characters, Ability to import SAS datasets is much more important anyway.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224?email_source=notifications&email_token=ADDTJBLXXNYPHDX4MXDWIBDQTMHTDA5CNFSM4CSGL4DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED323HA#issuecomment-553102748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDTJBKOHIDYNO3FKQY523LQTMHTDANCNFSM4CSGL4DA .

al-obrien commented 4 years ago

As stated in several places in this thread there are some alternatives to haven::write_sas that work quite well. I have outlined a method I tend to use on StackOverflow, as I believe it may be a bit off topic to share here: https://stackoverflow.com/a/58845669/4481570

szimmer commented 2 years ago

Should this function just be removed from the package since it doesn't work?

Kenkleinman commented 2 years ago

That would mean admitting that a tidyverse package is not as good as the independent package it superseded.


Sent from a small screen. Please excuse brevity and typos.

On Tue, Nov 30, 2021, 8:20 AM Stephanie Zimmer @.***> wrote:

Should this function just be removed from the package since it doesn't work?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-982629849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5NMUJNSVL7OSNK2N3MODUOTFR5ANCNFSM4CSGL4DA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

hadley commented 2 years ago

I had hoped that we'd be able to resolve this problem in haven, but several years later we haven't made any progress, so I agree that we should remove this function.

Kenkleinman commented 2 years ago

Has anyone spent any time at all on it?


Ken Kleinman, ScD (he/him) Professor, Associate Chair Department of Biostatistics and Epidemiology School of Public Health and Health Sciences University of Massachusetts, Amherst

On Tue, Nov 30, 2021 at 12:58 PM Hadley Wickham @.***> wrote:

I had hoped that we'd be able to resolve this problem in haven, but several years later we haven't made any progress, so I agree that we should remove this function.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-982881102, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5NMSGZIFA7V2AEFOXNGLUOUGFTANCNFSM4CSGL4DA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

BioStatMatt commented 2 years ago

Reading is easier because we don't need to understand everthing about the binary format to do it. Some of the parts we don't understand may be used directly or as a side-effect by SAS to authenticate the file as genuinely written by SAS software. And, SAS could change how this is done at any time without impacting backward compatibility. Getting data OUT of sas7bdat has always been the primary motivation; it solves the bidirectional compatibility problem. For these reasons, I've always felt that attempting to write sas7bdat is not worth the effort.

Edit: Just wanted to add that Evan and team have made huge advances in this effort over the years. Awesome work!

aminadibi commented 1 year ago

ahem