OpenNeuro bulk processing script ignores missing values

Is there an existing issue for this?

[X] I have searched the existing issues

Expected Behavior

In the big open neuro tsv we denote missing values by mapping them to the made up controlled term nb:MissingValue. But then we never use this value when making the data dictionaries.

what’s worse: we load the tsv in replace-na mode so many of these values get turned into pandas nans internally (eg “n/a”)

as a result, some datasets fail the cli because we don’t allow unannotated values.

Here is an example: https://github.com/neurobagel/openneuro-annotations/blob/763c46e782c792b946eb701e5379922b9ccad15a/ds000017.json#L40-L44

We need to

read the values portion off the tsv as string (or just everything as string)
Listen for the special controlled term as signal to put the value in the MissingValues section rather than the Levels section
Test this
Rerun data dictionary creation

Steps:

[x] Find an example dataset that has this problem atm in https://github.com/neurobagel/openneuro-annotations and link here
[x] change process_annotation_to_dict

Use: https://docs.google.com/spreadsheets/d/1_6dnAjl2B2xse3uEB9UgKQziZeduZ9MDAVtUPW5IwIY/edit?usp=sharing

neurobagel / bulk_annotations

OpenNeuro bulk processing script ignores missing values #23

Is there an existing issue for this?

Expected Behavior