selik / xport

Python reader and writer for SAS XPORT data transport files.
MIT License
49 stars 24 forks source link

Read label of variables #18

Closed FilipSmith closed 4 years ago

FilipSmith commented 6 years ago

Hi,

It tried without any success to read metadata as label and format of the xpt variables . Such an option seems to not be available.

Many thanks !

selik commented 6 years ago

I'm in the middle of a refactor and haven't yet updated the documentation to match the code. You can read the variable metadata. Each variable is represented by a namedtuple.

https://github.com/selik/xport/blob/fafd15a24ccd102fc92d0c0123b9877a0c752182/xport/v56.py#L366:L367

selik commented 6 years ago

@FilipSmith If you'd like to email me an example XPT document and tell me what you're trying to get out of it, I'll send you back a specific code example.

<mike [at] selik [dot] org>

FilipSmith commented 6 years ago

Thanks for the fast answer.

adsl.zip

Enclosed is a XPT file. For example, for the variable USUBJID , the label is "Unique subject identifier".

When I import the file with the command " with open(xport_file, 'rb') as f:" I cannot extract the label "Unique subject identifier" from f.

If it is possible to get also the format - type - length etc... as well I 'm interested. Cf. Picture below.

xport

Many thanks !

sbjain commented 5 years ago

Hi following up on this comment, yes how can we retrieve a certain variable in our xpt file -- read, edit the variable value, and then save the file?

Thanks, Sagar

selik commented 5 years ago

I'm in the middle of an API revision (for a very relaxed concept of "in the middle of"), so it looks like the current code raises a NotImplementedError if you try to write/dump to XPT. I think the previous API had a working dump function. Following the pattern of the json module, this xpt module loads XPT-format files as Python objects and dumps Python objects to XPT-format. So, the answer to your question is that you load, modify, then dump.

If this task is urgent for your business, I'm happy to accept a consulting contract ;-) Otherwise, I'll get around to fixing the documentation sometime in 2019. It's on my to-do list. I'm also happy to accept help.

selik commented 4 years ago

Solved #34

selik commented 4 years ago

Doublechecked that this works.

In [1]: import xport.v56

In [3]: with open('data/adsl.xpt', 'rb') as f:
   ...:     library = xport.v56.load(f)
   ...:

In [4]: library
Out[4]: <Library members=['ADSL']>

In [5]: library['ADSL']
Out[5]:
Member ADSL
    Variable       Type  Length  Format Informat                                     Label  Position
#
1    STUDYID  Character      12                                           Study Identifier         0
2    USUBJID  Character      11                                  Unique Subject Identifier        23
3     SUBJID  Character       4                           Subject Identifier for the Study        27
4     SITEID  Character       3                                      Study Site Identifier        30
5    SITEGR1  Character       3                                        Pooled Site Group 1        33
6        ARM  Character      20                                 Description of Planned Arm        53
7     TRT01P  Character      20                            Planned Treatment for Period 01        73
8    TRT01PN    Numeric       8                        Planned Treatment for Period 01 (N)        81
9     TRT01A  Character      20                             Actual Treatment for Period 01       101
10   TRT01AN    Numeric       8                         Actual Treatment for Period 01 (N)       109
11    TRTSDT    Numeric       8  DATE9.                Date of First Exposure to Treatment       117
12    TRTEDT    Numeric       8  DATE9.                 Date of Last Exposure to Treatment       125
13    TRTDUR    Numeric       8                               Duration of Treatment (days)       133
14     AVGDD    Numeric       8                                Avg Daily Dose (as planned)       141
15   CUMDOSE    Numeric       8                               Cumulative Dose (as planned)       149
16       AGE    Numeric       8                                                        Age       157
17    AGEGR1  Character       5                                         Pooled Age Group 1       162
18   AGEGR1N    Numeric       8                                     Pooled Age Group 1 (N)       170
19      AGEU  Character       5                                                  Age Units       175
20      RACE  Character      32                                                       Race       207
21     RACEN    Numeric       8                                                   Race (N)       215
22       SEX  Character       1                                                        Sex       216
23    ETHNIC  Character      22                                                  Ethnicity       238
24     SAFFL  Character       1                                     Safety Population Flag       239
25     ITTFL  Character       1                            Intent-To-Treat Population Flag       240
26     EFFFL  Character       1                                   Efficacy Population Flag       241
27   COMP8FL  Character       1                       Completers of Week 8 Population Flag       242
28  COMP16FL  Character       1                      Completers of Week 16 Population Flag       243
29  COMP24FL  Character       1                      Completers of Week 24 Population Flag       244
30  DISCONFL  Character       1                     Did the Subject Discontinue the Study?       245
31   DSRAEFL  Character       1                                    Discontinued due to AE?       246
32     DTHFL  Character       1                                              Subject Died?       247
33     BMIBL    Numeric       8                                      Baseline BMI (kg/m^2)       255
34  BMIBLGR1  Character       6                                Pooled Baseline BMI Group 1       261
35  HEIGHTBL    Numeric       8                                       Baseline Height (cm)       269
36  WEIGHTBL    Numeric       8                                       Baseline Weight (kg)       277
37   EDUCLVL    Numeric       8                                         Years of Education       285
38  DISONSDT    Numeric       8  DATE9.                           Date of Onset of Disease       293
39    DURDIS    Numeric       8                               Duration of Disease (Months)       301
40  DURDSGR1  Character       4                            Pooled Disease Duration Group 1       305
41  VISIT1DT    Numeric       8  DATE9.                                    Date of Visit 1       313
42   RFSTDTC  Character      20                          Subject Reference Start Date/Time       333
43   RFENDTC  Character      20                            Subject Reference End Date/Time       353
44  VISNUMEN    Numeric       8                   End of Trt Visit (Vis 12 or Early Term.)       361
45    RFENDT    Numeric       8  DATE9.                 Date of Discontinuation/Completion       369
46   DCDECOD  Character      27                              Standardized Disposition Term       396
47  DCREASCD  Character      18                                 Reason for Discontinuation       414
48   MMSETOT    Numeric       8                                                 MMSE Total       422

        STUDYID      USUBJID SUBJID SITEID SITEGR1  ... VISNUMEN   RFENDT        DCDECOD       DCREASCD  MMSETOT
0  CDISCPILOT01  01-701-1023   1023    701     701  ...      5.0  19238.0  ADVERSE EVENT  Adverse Event     23.0
1  CDISCPILOT01  01-701-1028   1028    701     701  ...     12.0  19737.0      COMPLETED      Completed     23.0
2  CDISCPILOT01  01-701-1294   1294    701     701  ...      9.0  19523.0  ADVERSE EVENT  Adverse Event     23.0

[3 rows x 48 columns]
name: ADSL, created: 2013-02-07 14:35:51, modified: 2013-02-07 14:35:51, sas_os: Linux, sas_version: 9.2