tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

dta output as version 15 is not readable by Stata #721

Open chamb244 opened 1 year ago

chamb244 commented 1 year ago

The issue described at https://github.com/tidyverse/haven/issues/461 still seems to be unresolved. Error still seems contingent on version (okay before 15), as shown in below example.

junk <- mtcars
haven::write_dta(junk, path="junk14.dta", version=14)
# reads just fine into Stata/SE 17.0
haven::write_dta(junk, path="junk15.dta", version=15)
# cannot read this output into Stata/SE 17.0 - error message: "dataset too large -- This .dta file format was created by Stata/MP and has more variables than your Stata can handle."

Created on 2023-05-10 with reprex v2.0.2

gorcha commented 1 year ago

Hi @chamb244, this is a documentation issue (thanks for the reminder!).

Stata itself uses a different version numbering scheme to haven. Current Stata versions use format 118 for most files, and 119 when there are more than 32,767 variables.

These are mapped to version = 14 and 15 respectively - they're both current formats but only Stata/MP supports more than 32,767 variables. In Stata the restriction on the number of variables is enforced using the file format rather than the actual number of variables included, so files written with version = 15 can only be opened by Stata/MP regardless of the number of variables included.

The default for write_dta() is version 14, which is the correct current format for files with less than 32,767, but an explanation of the difference needs to be added to the documentation.

From the spec for reference:

The format of .dta files has changed over time. Stata 17 writes what are known as .dta format-118 files and can read all formats of files that have ever been released. The recent history of .dta formats is

   Format    Current as of
   ---------------------------------------
     119     Stata 15 - 17 (when dataset has more than 32,767 variables)
     118     Stata 14 - 17
     117     Stata 13 
     116     internal; never released
     115     Stata 12
     114     Stata 10
     113     Stata  8
   ---------------------------------------