In the big open neuro tsv we denote missing values by mapping them to the made up controlled term nb:MissingValue. But then we never use this value when making the data dictionaries.
what’s worse: we load the tsv in replace-na mode so many of these values get turned into pandas nans internally (eg “n/a”)
as a result, some datasets fail the cli because we don’t allow unannotated values.
Is there an existing issue for this?
Expected Behavior
In the big open neuro tsv we denote missing values by mapping them to the made up controlled term nb:MissingValue. But then we never use this value when making the data dictionaries.
what’s worse: we load the tsv in replace-na mode so many of these values get turned into pandas nans internally (eg “n/a”)
as a result, some datasets fail the cli because we don’t allow unannotated values.
Here is an example: https://github.com/neurobagel/openneuro-annotations/blob/763c46e782c792b946eb701e5379922b9ccad15a/ds000017.json#L40-L44
We need to
Steps:
process_annotation_to_dict
Use: https://docs.google.com/spreadsheets/d/1_6dnAjl2B2xse3uEB9UgKQziZeduZ9MDAVtUPW5IwIY/edit?usp=sharing