sassoftware / clinical-standards-toolkit

The open source release of SAS Clinical Standards Toolkit is a direct port of the last production release with minor modifications to adapt to a new deployment architecture.
Apache License 2.0
27 stars 11 forks source link

Trying to recreate ADaM 2.1 data sets: which SDTM and Define XML metadata versions should be used? #11

Open hollandnumerics opened 1 year ago

hollandnumerics commented 1 year ago

In CSTSAMPLELIB, I'm trying to recreate the data sets in ADaM 2.1, but there are significant differences when using SDTM 3.1.3/3.2, and Define XML 2.0 does not include all of the columns in ADaM 2.1, including:

I want to use the recreation of the sample ADaM data sets as a training exercise for new clinical SAS programmers, but the lack of consistency is proving to be a major obstacle. Is this information documented anywhere?

............Phil Holland

lexjansen commented 1 year ago

Hi Phil, Nice to see that you are trying out openCST! The various modules in openCST demonstrate different processes. The examples in the Define-XML 2.0 module demonstrate various processes related to Define-XML v2.0 (creating full Define-XML v2.0 from metadata, creating initial Define-XML v2.0 from datasets, importing Define-XML v2.0, compare Define-XML v2.0 metadata against dataset metadata). The complete Define-XML examples in cdisc-definexml-2.0.0-1.7\sourcexml are based on the data in cdisc-definexml-2.0.0-1.7\transport. So, you may have more success when using the XPT files in cdisc-definexml-2.0.0-1.7\transport. The Define-XML v2.0 module was created in 2017, so the documentation is still up-to-date to that point in time. There are links to documentation at the bottom of the main page of the repo: https://github.com/sassoftware/clinical-standards-toolkit

Also realize that there is no end-to-end scenario demonstrated in openCST to derive ADaM from SDTM, or SDTM from CDASH. openCST does not support mapping metadata.

Hope this helps, Lex

hollandnumerics commented 1 year ago

Hi Lex,

I've had a look into cdisc-definexml-2.0.0-1.7\transport, and this is even more inconsistent. The metadata and ADaM variables do appear to match, but ADAM.ADSL has 254 subjects in STUDYID=CDISCPILOT01, and SDTM.DM has 5 subjects in STUDYID=CDISC01, so they are not related. Also both these ADAM and SDTM data sets are not related to the cdisc-adam-2.1-1.7/sascstdemodata and cdisc-sdtm_3.1.2-1.7/sascstdemodata data sets, which have 70 subjects in STUDYID=SASCSTDEMODATA.

Having an end-to-end scenario to derive ADaM from SDTM, with or without metadata, is not a requirement. I'm just looking for open-source data: preferably ADaM data sets that have been derived from available SDTM data sets using metadata algorithms that can be read, manually or programmatically somewhere.

As a last resort I may have to use the metadata and ADaM data sets in cdisc-definexml-2.0.0-1.7, and then reverse-engineer the SDTM data sets to give a consistent data source. Alternatively use the SDTM data sets in cdisc-sdtm_3.1.2-1.7/sascstdemodata with the metadata from cdisc-definexml-2.0.0-1.7 to regenerate a new ADaM library.

............Phil

lexjansen commented 1 year ago

Hi Phil,

Correct, the SDTM and ADaM in cdisc-definexml-2.0.0-1.7 are not related, since their sole purpose is to demonstrate the Define-XML capabilities. Another source for data may be: https://github.com/cdisc-org/sdtm-adam-pilot-project It only has Define-XML v1.0 though. openCST allows importing Define-XML v1 and migrating it to Define-XML v2.0 metadata, but that is only a start since Define-XML 2.0 has more requirements than Define-XML 1.0.

Lex