neurobagel / bagel-cli

Command line tool for Neurobagel data parsing and annotation
https://neurobagel.org/cli/
MIT License
2 stars 5 forks source link

Handle annotations where multiple columns are `IsAbout` the same attribute #224

Closed alyssadai closed 8 months ago

alyssadai commented 11 months ago

Context

Currently we store the first instance that appears in the TSV, e.g., for nb:Diagnosis

One example where this is bad: The QPN TSV contains two columns about diagnosis, one of which has values PD and PSP (progressive supranuclear palsy), and the other column classes individuals as PD or healthy control. Both were annotated in the data dictionary, but only the PD/PSP diagnosis column is in the graph right now due to order of appearance in the raw data.

So in short: a subject who has no data on the first column (i.e. missing value), but has data on the second column (i.e. is a healthy control) will still not show up as a healthy control. -> not good

Desired treatment of multi-column annotation

According to our current data model, a subject can have a list of multiple diagnosis or assessments, but only one participant ID, age, or sex (?).

We should update the phenotypic column handling logic to return a list of values for a set of columns annotated as being IsAbout the same attribute (currently, this function is called for sex, diagnosis, and age, but not assessment):

https://github.com/neurobagel/bagel-cli/blob/4da00b6db4cce30d40f101c0c4e17be25db3828f/bagel/pheno_utils.py#L213-L238

Storing only the first ('transformed') value for variables that do not support multiple values (age, sex) should be done via conditionals outside of this utility function.

isSubjectGroup, for now, should remain mutually exclusive with any diagnoses. So, if the list of values for columns about diagnosis contain at least one instance of healthy control, we say the subject isSubjectGroup and do not assign any diagnoses.

Steps to implement

alyssadai commented 9 months ago

Blocked by #225 to avoid bagel pheno command logic conflicts