tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

Dataset label truncated after `write_xpt` #746

Open kaz462 opened 10 months ago

kaz462 commented 10 months ago

From write_xpt documentation:

Note that although SAS itself supports dataset labels up to 256 characters long, dataset labels in SAS transport files must be <= 40 characters.

The following dataset label in Chinese has 40 characters and was truncated after write_xpt.

(thanks @siye6 for the original example in https://github.com/atorus-research/xportr/pull/194)

label <- "这是一段文字,用来测试在XPTversion5中作为数据集label是否会被截断"
nchar(label, type = "chars")
#> [1] 40
nchar(label, type = "bytes")
#> [1] 88

tmp <- tempfile(fileext = ".xpt")
haven::write_xpt(mtcars, tmp, label = label)
test <- haven::read_xpt(tmp)
attributes(test)$label
#> [1] "这是一段文字,用来测试在XPTv"

nchar(attributes(test)$label, type = "chars")
#> [1] 16
nchar(attributes(test)$label, type = "bytes")
#> [1] 40

Created on 2023-12-13 with reprex v2.0.2

ynsec37 commented 10 months ago

Dear developer,

I found that the label length that must be <= 40 is just used for the xpt 5, if the version = 8 the label should be up to 256.

sas xpt version 5

image

sas xpt version 8

image

botsp commented 10 months ago

Hi both, May I confirm a question about data conversion. I am trying to convert the .rda file to .sas7bdat, and it seems the "write_xpt" doesn't work as expected. The created sas7bdat file cannot be opened, it always shows "file ... is not a SAS data set".

I saw some discussion about this issue and doesn't found a good solution.

  1. What is the recommended method for converting the .rda file to a .sas7bdat file?

  2. It seems that "write_xpt" works well when converting to an .xpt file. Should I first convert the file to an xpt format and then change it to a .sas7bdat file using SAS? Are there any potential risks associated with this approach?

Looking forward to leanring the insights from your valuable experience. Many thanks!

ynsec37 commented 10 months ago

Hi @botsp It seems that write_xpt() may only support the xpt creation.

write_sas() creates sas7bdat files. Unfortunately the SAS file format is complex and undocumented, so write_sas() is unreliable and in most cases SAS will not read files that it produces. write_xpt() writes files in the open SAS transport format, which has limitations but will be reliably read by SAS.

For sas7bdat, I use the same way you mentioned, that is creating the xpt first by R then coverting to sas7bdat by SAS. After converting, I compared results from write_xpt() with SAS datasets directly created by SAS, there is no difference except the variable length.

botsp commented 10 months ago

Thanks for your explanation and this inspire me about the method of sas data conversion. Thank you!

gorcha commented 9 months ago

Hi @kaz462 and @ynsec37,

Thanks for the feedback! This is an issue with our dataset label validation code, and the documentation could be clearer - the dataset label for XPT files is a maximum of 40 bytes rather than characters. Our validation code is currently checking with the default type = "chars" and should be updated to type = "bytes".

@ynsec37 note that the XPT documentation shared above is referring to the variable label length. Although variable labels can be longer in version 8 the maximum dataset label length is still 40 bytes.