neurobagel / planning

MIT License
0 stars 0 forks source link

Double-check `open_neuro` datasets missing imaging sessions/data #75

Open alyssadai opened 10 months ago

alyssadai commented 10 months ago

The list of missing datasets (in open_neuro graph but not in query tool result) are:

ds002912
ds002939
ds002982
ds003967
ds003082
ds003754
ds002718

What's the problem?

Based on inspecting these datasets in https://github.com/OpenNeuroDatasets, the problem is that the imaging data available in these datasets cannot be modeled by the CLI and thus the subjects have no (imaging) session info in the resulting JSONLD. Since the query template used in the API assumes that all subjects have at least one session (see https://github.com/neurobagel/api/blob/18d5d95ecf8ae6c2ee4b56cbe7a279ef684a8498/app/api/utility.py#L185-L188), the above datasets are never matched by any query sent using the API/query tool.

More details on the datasets

Here's a gist table https://gist.github.com/alyssadai/40c170e7f79117a276dc1586a1ebf344 with the missing dataset names, URLs, and specific observations on the BIDS data.

Of these, the only dataset where it's not immediately expected that the BIDS data wouldn't be able to be modeled by the CLI is ds003082. This one seems to have some problems with how session directories are named in addition to having some imaging files we don't support yet.

More info

Next steps

We should:

Originally posted by @alyssadai in https://github.com/neurobagel/planning/issues/54#issuecomment-1813750886

github-actions[bot] commented 7 months ago

We want to keep our issues up to date and active. This issue hasn't seen any activity in the last 75 days. We have applied the _flag:stale label to indicate that this issue should be reviewed again. When you review, please reread the spec and then apply one of these three options: