Open jcohenadad opened 1 year ago
Thanks for opening the issue! It seems that there's some misunderstanding in what conversion scripts are doing.
it duplicates the data (more space on HD)
No, it does NOT duplicate the data. The MSD conversion script is just a pointer to the original, version-tracked bids dataset. This line shows that the output is just a .json
file containing the paths to the image and labels of the original bids dataset
the dataset used for training is not synced anymore
Based on what I wrote above, since the output is json file point to the bids dataset, the script only takes the latest paths to the bids dataset. There is NO duplication of the datasets anywhere.
What do I mean by "pointing to the original bids dataset"? here's a screenshot of how the json file looks:
Hope this clarifies some things a bit!
Hope this clarifies some things a bit!
It does! Thanks a lot @naga-karthik ! Your solution is exactly what we need. I just would like to make it more visible to the lab, eg create a template script in this repos maybe?
I created something like that here (and the students in the lab do know that the conversion scripts exist).
create a template script in this repos maybe?
It's pretty hard to create a template script that just works in a plug-and-play manner. The suffixes, contrasts, sessions, etc. are just too different with the kind of the datasets we have so the script I linked above is just meant to be a starting off point. The students would have to look at the code, make tiny modifications depending on how their data looks (I also make it a bit easier by adding TODOs for where to add stuff).
The fact that @louisfb01 started of with @naga-karthik 's script (instead of starting from scratch) is evidence that having at least a script to start from is better than no script at all, and therefore is a justification to put something in this repos and redirect students to it (and, importantly, improve that script over time).
Talking with @louisfb01 I realized that the lab does not have a procedure for training MONAI models from BIDS dataset, and instead convert the data physically, which is problematic because:
I know the MONAI folks have been working with BIDS compatibility. Can people please link in this GH discussion thread all the existing ressources, and also discuss strategies for the lab to come up with a unified protocol/script for preparing a JSON file for MONAI training.
The solution should accommodate the aggregation of multiple BIDS datasets.
Some resource: