pharmaverse / sdtm.oak

An EDC and Data Standard agnostic SDTM data transformation engine that automates the transformation of raw clinical data in ODM format to SDTM based on standard mapping algorithms
https://pharmaverse.github.io/sdtm.oak/
Apache License 2.0
25 stars 7 forks source link

WIP: General Issue: glossary #37

Open kamilsi opened 8 months ago

kamilsi commented 8 months ago

Background Information

Based on discussions on Slack and #30 we want to start a glossary of key terms in the project.

Definition of Done

No response

ramiromagno commented 7 months ago

Hi @kamilsi, @rammprasad, @edgar-manukyan:

May I make a request? Can we start by trying to clarify these terms in the glossary?

I am aware of https://www.cdisc.org/kb/articles/domain-vs-dataset-whats-difference. However, I think it still warrants clarification.

Take the case of the domain definition.

Domain: A collection of logically related observations with a common, specific topic that are normally collected for all subjects in a clinical investigation.

If we were to apply the concept of domain to R's iris data set, then I guess we could call it a domain in the sense that the iris data frame is a collection of related observations with a common topic, i.e. plant leaves (not plant species, right?). So even if we split the iris data into two data frames, the set of the two data frames would still be that one domain whose topic is about leaves, right? So one domain is typically materialized as one dataset, but it needs not to. Real life examples would help here.

Then, the CDISC definition of dataset:

A collection of structured data in a single file.

It is, perhaps, not so obvious either... To start, because of the reference to a file. I am guessing that the original intention was to refer to the implementation on a computer, be it a file, an object in memory, database, etc.. Right? It feels like the idea is to say that the domain definition corresponds to the conceptual idea of a data set, and that the dataset is the actual instantiation of that concept on a computer.

Regarding topic, can we say that it equates with the concept of observational unit in tidy data?