selik / xport

Python reader and writer for SAS XPORT data transport files.
MIT License
49 stars 24 forks source link

Support long format and informat descriptions #85

Open selik opened 2 years ago

selik commented 2 years ago

SAS's publicly available technical paper, Record Layout for a SAS® Version 8 or 9 Data Set in SAS® Transport Format describes handling format and informat names longer than 8 characters:

If you have any format or informat names that exceed 8 characters, regardless of the label length, a different form of label record header is used: HEADER RECORD*******LABELV9 HEADER RECORD!!!!!!!nnnnn where nnnnn is the number of variables for which long format names and any labels will be defined. Each label is defined using the following: aabbccddeef.....g.....h.....i..... where:

Note: The FORMAT and INFORMAT descriptions are in the form used in a FORMAT or INFORMAT statement. For example, my_long_fmt., my_long_fmt8., my_long_fmt8.2. The text values are streamed together and no characters appear for attributes with a length of 0 bytes. For example, variable number 1 is named X and has a label of 'ABC,' no attached format, and an 11-character informat named my_long_fmt with informat length=8 and informat decimal=0. The data would be: (hex) 010103000d (characters) XABCmy_long_fmt8.

https://support.sas.com/content/dam/SAS/support/en/technical-papers/record-layout-of-a-sas-version-8-or-9-data-set-in-sas-transport-format.pdf

selik commented 2 years ago

The FORMAT and INFORMAT descriptions are in the form used in a FROMAT or INFORMAT statement.

I think Stata misinterpreted this instruction and wrote the description instead of the name into the namestr struct. I'm removing the bug label, because I think there's a bug in Stata's implementation that I compared against, and not a bug in this module.