Closed Jiang-Li-backup closed 1 year ago
Please provide a reproducible example.
Thx. I also compared with the same data set created by SAS. The file generated by SAS EG is twice as big as that generated by R Haven. May I send the data files and code to you by email?
Sure - both haven and SAS versions of the same file would be super helpful.
I emailed the data and code to you a few days ago. Please let me know if you have not received it. Thx.
Observation
Byte 33 has value 33 which indicates that the R SAS datasets use the layout of unix 64 bit.
I will try to read the R SAS dataset on Unix 64.
This should not matter, Win SAS should be able to read the dataset. I will look at the rest of the layout. Unfortunately, I don't have WPS on Unix so I will create a unix SAS SAS dataset(not WPS) so more of the R layout can be checked. SAS datasets seemed to have more filler which complicates the comparison.
At least the magic string matches
0-32 bytes Magic string 000000000000000000000000C2EA8160B31411CFBD92080009C7318C181F1011 WPS 000000000000000000000000C2EA8160B31411CFBD92080009C7318C181F1011 R
33-64 bytes 2222003333010232040000000000000000000301181F10112222003333010232 WPS 3300002200010031000000000000000000000000000000000000000000000000 R
A quick note about the "filler". SAS files are organized into a "header" followed by 1 or more "pages". By default haven will write a header of size 1024 bytes and pages of size 4096 bytes. This layout was common with older versions of SAS, but newer versions seem to use much larger header and page sizes. The sizes are defined as 32-bit integers at (or near) byte offsets 200 and 204. In any event, extra-large page sizes are the source of filler in the SAS files. If you skip to the beginning of each page it should be easier to analyze the file.
For binary debugging I use hecate.
Thanks for the info and the excellent work.
None of Matt's test datasets have byte 32=33 and byte 35=22. I have checked many other datasets created with different hardware and none of them have the 33/22 combination that write_sas creates
Test datasets for Matt SAS7BDAT
32 33 34 35 36 37
compress_no.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3NET_ASRV........ compress_yes.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3NET_ASRV........ lowbwt_i386.sas7bdat 22 22 00 32 22 01 02 32 9.0202M0W32_VSPRO....... missing_values.sas7bdat 22 22 00 32 22 01 02 32 9.0202M0W32_VSPRO....... obs_all_perf_1.sas7bdat 22 22 00 32 22 01 02 32 9.0101M3XP_PRO.......... adsl.sas7bdat 22 22 00 33 33 01 02 32 ....9.0202M3X64_ESRV.... eyecarex.sas7bdat 22 22 00 33 22 00 02 31 ....9.0000M0WIN......... lowbwt_x64.sas7bdat 22 22 00 33 33 01 02 32 ....9.0202M2X64_VSPRO... natlterr1994.sas7bdat 33 22 00 33 33 00 02 31 ........9.0101M3SunOS... natlterr2006.sas7bdat 33 22 00 33 33 00 02 31 ........9.0101M3SunOS... txzips.sas7bdat 33 22 33 22 00 33 33 01 02 31 ........9.0201M0Linux...
@rogerjdeangelis Nice detective work!
Hello @hadley, do you have some news about this issue ?
I have the same issue. I can create a data frame from csv and write it to sas7bdat, however I can't open them in SAS viewer and results in error. I can see the file size has increased(i.e data is written within the file)
Steps to reproduce:
test <- read.csv("test.csv")
library("haven"); write_sas(test, "one.sas7bdat");
Is there a solution to this issue yet?
It's really misleading that the readme for this project indicates that haven is capable of writing SAS data files when this and several related bugs indicate that it can't.
@evanmiller Sorry for kicking up yet another old issue, but we are running into this problem as well. Data written using write_sas
cannot be loaded by SAS (9.4), but works fine when read using read_sas
from inside R.
The following is enough to reproduce it on Windows Server 2012R2 using the latest (master) version of haven and SAS 9.4. The exported file dataset.sas7bdat
fails to load with an error that the dataset is not a SAS dataset.
library(haven)
dat <- as.data.frame(matrix(1, 10, 10))
write_sas(dat, 'dataset.sas7bdat')
Though not a direct solution, if an R > SAS workflow is necessary for your work it looks like haven::write_sav()
could be a temporary workaround. SAS can then read in the .sav file and variable labels, formats, etc persist (whereas they wouldn't with readr::write_csv()
, etc).
So, in R:
library(tidyverse)
library(haven)
# View first 10 obs
diamonds(head, 10)
#> # A tibble: 10 x 10
#> carat cut color clarity depth table price x y z
#> <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
#> 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
#> 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
#> 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
#> 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
#> 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
#> 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
#> 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
#> 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
#> 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
# Export to sav
diamonds %>%
head(10) %>%
write_sav( "<path-to-file>")
Then, in SAS:
/* Import from sav */
proc import out = diamonds
datafile = "<path-to-file>"
dbms = SAV replace;
run;
***
NOTE: The import data set has 10 observations and 10 variables.
NOTE: WORK.DIAMONDS data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds
**;
/* View the data */
ods listing;
proc print data = diamonds;
run;
***
The SAS System
Obs carat cut color clarity depth table price x y z
1 0.23 Ideal E SI2 61.50 55.00 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.80 61.00 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.90 65.00 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.40 58.00 334 4.20 4.23 2.63
5 0.31 Good J SI2 63.30 58.00 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.80 57.00 336 3.94 3.96 2.48
7 0.24 Very Good I VVS1 62.30 57.00 336 3.95 3.98 2.47
8 0.26 Very Good H SI1 61.90 55.00 337 4.07 4.11 2.53
9 0.22 Fair E VS2 65.10 61.00 337 3.87 3.78 2.49
10 0.23 Very Good H VS1 59.40 61.00 338 4.00 4.05 2.39
**;
Can you please try again with the development version of haven?
The free version of WPS, WPS Express, can create a SAS dataset of unlimited size with 'Proc R. WPS interfaces with R in the same way SAS/IML interfaces with R.
It looks like there are no 'non-commercial' limitations with express.
download from https://www.worldprogramming.com/us/try-or-buy/wps/editions/express
You can use haven to import large SAS datasets and WPS/Proc R to create a unlimited output SAS dataset. Not sure but the same may be true for 'proc python(WPS van create a SAS dataset from Python.
Otherwise I like to use mySQL or MS Access to move data from/to SAS.
On Thu, Feb 15, 2018 at 12:18 PM, Hadley Wickham notifications@github.com wrote:
Can you please try again with the development version of haven?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-365998294, or mute the thread https://github.com/notifications/unsubscribe-auth/AMc0hW-Eb6API9VZ2QvfXm-w1awg3kh4ks5tVGbrgaJpZM4KRl8G .
There is some information about interfaces between WPS/SAS and pyhon/R on my github
https://github.com/rogerjdeangelis
On Thu, Feb 15, 2018 at 12:34 PM, Roger DeAngelis <rogerjdeangelis@gmail.com
wrote:
The free version of WPS, WPS Express, can create a SAS dataset of unlimited size with 'Proc R. WPS interfaces with R in the same way SAS/IML interfaces with R.
It looks like there are no 'non-commercial' limitations with express.
download from https://www.worldprogramming.com/us/try-or-buy/wps/editions/express
You can use haven to import large SAS datasets and WPS/Proc R to create a unlimited output SAS dataset. Not sure but the same may be true for 'proc python(WPS van create a SAS dataset from Python.
Otherwise I like to use mySQL or MS Access to move data from/to SAS.
On Thu, Feb 15, 2018 at 12:18 PM, Hadley Wickham <notifications@github.com
wrote:
Can you please try again with the development version of haven?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-365998294, or mute the thread https://github.com/notifications/unsubscribe-auth/AMc0hW-Eb6API9VZ2QvfXm-w1awg3kh4ks5tVGbrgaJpZM4KRl8G .
Just installed the dev version. The exported sas file still can not be opened by SAS Enterprise Guide 7.1.
same issue as @Jiang-Li
An observation:
library(haven)
class<-read_sas("class.sas7bdat")
write_sas(class, "class_fromR.sas7bdat")
class_fromr <-read_sas("class_fromR.sas7bdat")
The first read_sas gives back a tibble "class", but the second one class_fromr is a data frame.
identical(class, class_fromr) [1] FALSE
same issue here
Ok, we'll keep trying. Thanks for all the feedback!
@hadley What is the latest development on this issue? I have the same problem!
I can confirm that I am also having issues, as described here, on the latest version of the CRAN package and R/Rstudio.
Same issue.
Same issue.
I still have the same problem here for haven package.
Same issue. Here's all the debug information I could think to collect.
SAS / Linux version: NOTE: Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.4 (TS1M2) NOTE: This session is executing on the Linux 3.0.101-108.21-default (LIN X64) platform. NOTE: Updated analytical products:
SAS/STAT 13.2
SAS/ETS 13.2
SAS/OR 13.2
SAS/IML 13.2
SAS/QC 13.2
R/Linux Versions:
R.Version() $platform [1] "x86_64-conda_cos6-linux-gnu"
$arch
[1] "x86_64"
$os
[1] "linux-gnu"
$system
[1] "x86_64, linux-gnu"
$status
[1] ""
$major
[1] "3"
$minor
[1] "4.3"
$year
[1] "2017"
$month
[1] "11"
$day
[1] "30"
$`svn rev`
[1] "73796"
$language
[1] "R"
$version.string
[1] "R version 3.4.3 (2017-11-30)"
$nickname
[1] "Kite-Eating Tree"
Package Versions: haven version 1.1.0
file size comparison mtcars exported form haven vs. mtcars created natively in SAS (from csv exported from R): -rw-r--r-- 1 9216 Nov 9 11:35 haven_mtcars.sas7bdat -rw-r--r-- 1 1236 Nov 9 11:21 mtcars.txt -rw-r--r-- 1 131072 Nov 9 11:34 sas_mtcars.sas7bdat
Hi, Is there any update on this anywhere - I'm having the same issue. write_sas() files cannot be opened in SAS (9.4)
Still an issue with haven 2.0.0. Perhaps the issue could be noted in the documentation until a fix is found. By the way, SAS has never published the technical specs for .sas7bdat files, which are Windows-specific. The SAS transport format (.xpt extension) is platform independent and openly documented. These files can be read and written with the foreign
and SASxport
packages.
Same issue. Easiest way to replicate it
library(haven) write_sas(mtcars,"mtcars.sas7bdat")
open mtcars.sas7bdat in SAS and it says not compatible etc...
For all what it is worth - in https://github.com/reikoch/testfiles there is now a small dummy SAS v9 dataset with genuine v5 and v8 transport file formats next to it.
Here is my way to export r data to SAS. It uses PROC IML in SAS to run the R code. So yes, it requires SAS and IML. But it should be able to import any R data.frame without problem.
Here is the SAS code:
/*
Import R data to SAS
rfile: r file name
dsn: SAS dataset name
*/
libname test "sas_data_folder";
%LET rfile = my_folder/my_filename.fst;
%LET dsn = test.sas_data_name;
PROC IML;
rfile = "&rfile";
SUBMIT rfile / R;
df = fst::read.fst("&rfile")
ENDSUBMIT;
RUN ImportDataSetFromR("&dsn","df");
QUIT;
I save r data into .fst, but you can use any format you want, just change the R code df = fst::read.fst("&rfile")
accordingly. Keep the double quotes.
You don't have to leave R to run the above code. You can generate the SAS code file in R, and then use system
to invoke SAS command line to run the SAS program.
You will also need to make sure your SAS can run R code by adding the following to "C:\Program Files\SASHome2\SASFoundation\9.4\nls\en\sasv9.cfg"
-RLANG
-SET R_HOME "C:\Program Files\R\R-3.6.1"
Here is my R code to implement the whole process :
# fst_file: full path of the fst file name
# sas_folder: path of the sas data folder
# sas_dsn: sas data set name
# sas_code_file: full path of the sas code file
ExportToSAS = function(fst_file, sas_folder, sas_dsn, sas_code_file)
{
# create sas code file
sas_code = 'LIBNAME mylib "%SAS_FOLDER%";
PROC IML;
SUBMIT fst_file / R;
df = fst::read.fst("%FST_FILE%")
ENDSUBMIT;
RUN ImportDataSetFromR("mylib.%SAS_DSN%","df");
QUIT;'
sas_code = gsub("%FST_FILE%", fst_file, sas_code, fixed = TRUE)
sas_code = gsub("%SAS_FOLDER%", sas_folder, sas_code, fixed = TRUE)
sas_code = gsub("%SAS_DSN%", sas_dsn, sas_code, fixed = TRUE)
writeLines(sas_code, sas_code_file)
# run the sas code file
sas_log_file = gsub(".sas", ".log", sas_code_file, fixed = TRUE)
sas_lst_file = gsub(".sas", ".lst", sas_code_file, fixed = TRUE)
sas_command = '"c:/Program Files/SASHome2/SASFoundation/9.4/sas.exe" -CONFIG "c:/Program Files/SASHome2/SASFoundation/9.4/sasv9.cfg" -NOSPLASH -SYSIN "%SAS_CODE_FILE%" -LOG "%SAS_LOG_FILE%" -PRINT "%SAS_LST_FILE%"'
sas_command = gsub("%SAS_CODE_FILE%", sas_code_file, sas_command, fixed = TRUE)
sas_command = gsub("%SAS_LOG_FILE%", sas_log_file, sas_command, fixed = TRUE)
sas_command = gsub("%SAS_LST_FILE%", sas_lst_file, sas_command, fixed = TRUE)
system(sas_command, invisible = FALSE)
}
As Hadley pointed out earlier, SAS is quite litigious and I don't see much hope for a solution. Third party applications like Stat/Transfer can handle the SAS proprietary format, but they've probably paid a good deal of money for the privilege. The best solution I see is to save in SAS transport format with haven::write_xpt()
and then import into SAS. Keep in mind variable names are limited to 32 characters, Ability to import SAS datasets is much more important anyway.
The free full version of WPS/CE(6 month license) is available for download.
WPS can read and write sas7bdats.
Also WPS can convert a SAS dataset to an R or python dataframe and convert a T amd Python dataframes to SAS.
My github site has examples
The problem is NOT reading SAS datasets, the problem is creating SAS datasets
SAS Forumhttps://tinyurl.com/y82q6vdlhttps://communities.sas.com/t5/SAS-Procedures/Importing-Excel-files-in-syntax-without-SAS-ACCESS/m-p/492763
StackOverflowhttps://stackoverflow.com/questions/49235845/export-pandas-dataframe-to-sas-sas7bdat-format
github WPS Proc Pythonhttps://github.com/rogerjdeangelis/utl_wps_python_read_write_sas_tables
seehttps://goo.gl/uqrg8ahttps://communities.sas.com/t5/SAS-in-Health-Care-Related/Read-SAS-file-without-any-SAS-installation-in-net-4-6/m-p/389054
Read SAS7BDAT
1. R (open source)
2. Python (open source)
3. WPS (free express edition - has nor limit on the size od SAS
datasets created by R)
StatTransfer (SAS owns)
DBMS Copy
Carolina (Dulles software)
SPSS
DSREAD (can creates CSV files)
CozyRoc
Write SAS7BDAT
R (open source - under development -fails )
StatTransfer ($350 Business Perpetual)
WPS (free full early release edition - no limit on the size od SAS datasets created by R or Python? )
DBMS Copy (SAS bought it out and shut it down?, Several universites are granfathered in and studenst can get it for $25?) The grandfater issue is interesting. Can you transfer the software to someone else after you graduate. However I believe some of the licenses stipulate that you can only use it ON CAMPUS?(weird)
Carolina (uses JDBC?) $495
SPSS ($100 per month for desktop)
CozyRoc ($4,000 server license perpetual)
Work stopped in R to create SAS datasets (for now?). https://github.com/tidyverse/haven
R
Importing and Exporting V5 and V8 transport files
Allows long variable names but char vars are limited to 200 bytes. Does not handle formats correctly. Easy fix for longer variables.
With my changes allows long variable names but char vars are limited to 200 bytes.
On Tue, Nov 12, 2019 at 3:36 PM Michael Laviolette notifications@github.com wrote:
As Hadley pointed out earlier, SAS is quite litigious and I don't see much hope for a solution. Third party applications like Stat/Transfer can handle the SAS proprietary format, but they've probably paid a good deal of money for the privilege. The best solution I see is to save in SAS transport format with haven::write_xpt() and then import into SAS. Keep in mind variable names are limited to 32 characters, Ability to import SAS datasets is much more important anyway.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224?email_source=notifications&email_token=ADDTJBLXXNYPHDX4MXDWIBDQTMHTDA5CNFSM4CSGL4DKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED323HA#issuecomment-553102748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDTJBKOHIDYNO3FKQY523LQTMHTDANCNFSM4CSGL4DA .
As stated in several places in this thread there are some alternatives to haven::write_sas
that work quite well. I have outlined a method I tend to use on StackOverflow, as I believe it may be a bit off topic to share here: https://stackoverflow.com/a/58845669/4481570
Should this function just be removed from the package since it doesn't work?
That would mean admitting that a tidyverse package is not as good as the independent package it superseded.
Sent from a small screen. Please excuse brevity and typos.
On Tue, Nov 30, 2021, 8:20 AM Stephanie Zimmer @.***> wrote:
Should this function just be removed from the package since it doesn't work?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-982629849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5NMUJNSVL7OSNK2N3MODUOTFR5ANCNFSM4CSGL4DA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I had hoped that we'd be able to resolve this problem in haven, but several years later we haven't made any progress, so I agree that we should remove this function.
Has anyone spent any time at all on it?
Ken Kleinman, ScD (he/him) Professor, Associate Chair Department of Biostatistics and Epidemiology School of Public Health and Health Sciences University of Massachusetts, Amherst
On Tue, Nov 30, 2021 at 12:58 PM Hadley Wickham @.***> wrote:
I had hoped that we'd be able to resolve this problem in haven, but several years later we haven't made any progress, so I agree that we should remove this function.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tidyverse/haven/issues/224#issuecomment-982881102, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5NMSGZIFA7V2AEFOXNGLUOUGFTANCNFSM4CSGL4DA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Reading is easier because we don't need to understand everthing about the binary format to do it. Some of the parts we don't understand may be used directly or as a side-effect by SAS to authenticate the file as genuinely written by SAS software. And, SAS could change how this is done at any time without impacting backward compatibility. Getting data OUT of sas7bdat has always been the primary motivation; it solves the bidirectional compatibility problem. For these reasons, I've always felt that attempting to write sas7bdat is not worth the effort.
Edit: Just wanted to add that Evan and team have made huge advances in this effort over the years. Awesome work!
ahem
I exported data to a SAS file. This file can be read by haven, but not the SAS. I am using SAS EG 7.1.