tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

write_sav is acting inconsistently for long string variables. #752

Open esul1121 opened 6 months ago

esul1121 commented 6 months ago

Hello- I don't think it's a new issue as I found multiple threads here regarding split variables from long string variables.

Here is my example and not sure how to reproduce this.

I was converting a dataframe into .sav format but when the data opened in SPSS some of the string having lengths between 500 to 8000 stayed the same while other string got split into string variables of length 255 each.

Why is it acting inconsistently? It seems like some string managed to accept the maximum original length while some strings split.

oriloc commented 5 months ago

I had the same issue recently. And I think I can narrow it down a bit.

The issue appears when exporting a data frame with multiple string variables that have similar (long) names. For some reason, the string var gets split into multiple sub variables with 255 characters if one variable name starts with the same 8 characters like a previous one. In my example I created 4 different variables and filled them with strings of 1200 characters. Q2_96_TEXT_text -> works fine Q2_96_T -> works fine Q2_96_TE -> splits the variable in multiple substrings of 255 chars Q3_96_TE -> works fine

EXAMPLE CODE:

n <- 1200
df <- data.frame(Q2_96_TEXT_text = paste(rep("a", n), collapse = ""),
                  Q2_96_T = paste(rep("b", n), collapse = ""),
                  Q2_96_TE = paste(rep("c", n), collapse = ""),
                  Q3_96_TE = paste(rep("d", n), collapse = ""),
                 stringsAsFactors = FALSE)

   write_sav(df, path = "test.sav")