neurobagel / bulk_annotations

Retroactively annotate a large number of BIDS datasets at once
MIT License
0 stars 1 forks source link

OpenNeuro bulk processing script ignores missing values #23

Closed alyssadai closed 1 year ago

alyssadai commented 1 year ago

Is there an existing issue for this?

Expected Behavior

In the big open neuro tsv we denote missing values by mapping them to the made up controlled term nb:MissingValue. But then we never use this value when making the data dictionaries.

what’s worse: we load the tsv in replace-na mode so many of these values get turned into pandas nans internally (eg “n/a”)

as a result, some datasets fail the cli because we don’t allow unannotated values.

Here is an example: https://github.com/neurobagel/openneuro-annotations/blob/763c46e782c792b946eb701e5379922b9ccad15a/ds000017.json#L40-L44

We need to

Steps:

Use: https://docs.google.com/spreadsheets/d/1_6dnAjl2B2xse3uEB9UgKQziZeduZ9MDAVtUPW5IwIY/edit?usp=sharing