Upload dataset on OpenNeuro

jcohenadad commented 11 months ago

Use OpenNeuro to version track datasets used for specific training rounds and revisions. This doesn't need to be public, version tracking can be private and then we will make it public after all the iterations are done.

Related to #1

MerveKaptan commented 9 months ago

Hi @jcohenadad ,

I hope you had a great start to the new year!

Could you please have a look at this data organization? I can only have 2 participants due to space limitations - but should be enough to show you the data organization. This organization is BIDS compatible, as we discussed in #1 . Should you approve this, I will reorganize the whole dataset into this format and upload it to OpenNeuro. The authors, funding etc etc etc will be added later. spinalfmrisegmentation.zip

jcohenadad commented 9 months ago

It should say somewhere what type of preprocessing was applied to the raw data (eg moco, etc.).

also, make sure it passes the BIDSvalidator:

MerveKaptan commented 9 months ago

Hi @jcohenadad,

Thank you very much for the quick reply.

Sure, I can add it as a run name I guess! I will see what the best option is!

yes, it does, it just gives a few warnings which are fine!

jcohenadad commented 9 months ago

since we ended up cropping the data to only 20 volumes, i find that naming them eg. "motor" is a bit misleading, as:

in some cases there was likely no task (for the first 20 vols)
the dataset should not be used for BOLD analysis (as it currently suggests).

my suggestion would be to rename them as 'rest' as suggested in WARNING 1.

jcohenadad commented 9 months ago

WARNING 2 is also worrisome: number of files should be the same across subjects

jcohenadad commented 9 months ago

WARNING 3 can be addressed right away. Why wait?

MerveKaptan commented 9 months ago

Thank you very much!

Wrt 1, is up to you, yes I agree that first 20 volumes may not contain task- but what if it does, rest may be misleading as well? Ideally, it would have been nice to get data during task blocks, now that I think about it.

Wrt number 2, i do not think that is worrisome because Leipzig data has pain task and Ken s data has motor task. because it will be multisite dataset, we will get that error should we decide to keep different task names. I believe that warning refers to that, otherwise, each folder should have 1 dataset.

Wrt number 3, I just added your name to test. I will organize the whole dataset. some parts of the organization will have to be manual (also to make sure the authors are correct etc.) So, I just wanted to get an exemplary dataset first and will organize the whole dataset later.

jcohenadad commented 9 months ago

Wrt 1, is up to you, yes I agree that first 20 volumes may not contain task- but what if it does, rest may be misleading as well? Ideally, it would have been nice to get data during task blocks, now that I think about it.

hum... i really don't think we should encourage people to do anything useful from a functional standpoint with only 20 vol. What was the motivation for adding the individual scans again?

Alternatively, if we really want the data to be more useful, should we then reconsider adding also the source (ie: non-moco) data with all the volumes? One possibility would be to upload all volumes under sub-XX and the moco + moco-mean under derivatives.

Wrt number 2, i do not think that is worrisome because Leipzig data has pain task and Ken s data has motor task. because it will be multisite dataset, we will get that error should we decide to keep different task names. I believe that warning refers to that, otherwise, each folder should have 1 dataset.

then that relates to Wrt 1. If we fix Wrt 1 by naming all the files 'rest', that will do

Wrt number 3, I just added your name to test. I will organize the whole dataset. some parts of the organization will have to be manual (also to make sure the authors are correct etc.) So, I just wanted to get an exemplary dataset first and will organize the whole dataset later.

to do things properly, i suggest adding the authors for the dataset you currently have (ie: leipzig and nw). Then you upload on OPENNEURO. Then you add more datasets with more authors, etc.

Additional comments:

sub-leipzigP01_task-pain_desc-spinalcord_mask.nii.gz needs a JSON sidecar
i find calling derivatives/moco is a bit misleading, given that the actual moco data are in the source, and all derivatives/moco has is the average volume (relates to my comment about warning1)
sub-leipzigP01_task-pain_desc-mocomean_bold.nii.gz needs a JSON sidecar

MerveKaptan commented 9 months ago

Thank you so much for the reply @jcohenadad !

hum... i really don't think we should encourage people to do anything useful from a functional standpoint with only 20 vol. What was the motivation for adding the individual scans again?

Individual volumes? So that we can test and develop our segmentation method on individual volumes which hopefully can be used for developing a moco algorithm!

Alternatively, if we really want the data to be more useful, should we then reconsider adding also the source (ie: non-moco) data with all the volumes? One possibility would be to upload all volumes under sub-XX and the moco + moco-mean under derivatives.

Useful for us (for development of other things), for others, or both? I can ask the authors of the datasets about this in parallel. I do not think we should wait for this though, it may delay our output significantly, would you agree?

then that relates to Wrt 1. If we fix Wrt 1 by naming all the files 'rest', that will do

Yes, will change that!

to do things properly, i suggest adding the authors for the dataset you currently have (ie: leipzig and nw). Then you upload on OPENNEURO. Then you add more datasets with more authors, etc.

Sure, once you approve the format, I will upload to OpenNeuro.

Additional comments:

sub-leipzigP01_task-pain_desc-spinalcord_mask.nii.gz needs a JSON sidecar

i find calling derivatives/moco is a bit misleading, given that the actual moco data are in the source, and all > derivatives/moco has is the average volume (relates to my comment about warning1)

sub-leipzigP01_task-pain_desc-mocomean_bold.nii.gz needs a JSON sidecar

My follow-up questions:

For 1 and 3: What kind of json sidecar would you like to have for derivatives? Could you please be so kind as to show me an example? I have not prepared such sidecar files for my derivatives before - did not know it was necessary!
For 2: What would you prefer to call it? I kept the original naming as we initially discussed. Happy not to separate derivatives as moco and label and put all of them under individual derivatives/sub-XX folders.

jcohenadad commented 9 months ago

hum... i really don't think we should encourage people to do anything useful from a functional standpoint with only 20 vol. What was the motivation for adding the individual scans again?

Individual volumes? So that we can test and develop our segmentation method on individual volumes which hopefully can be used for developing a moco algorithm!

i'm having after thoughts about this 'mid point' solution. I'd say let's share the whole time series, or only the mean moco, but not a sample of it.

Alternatively, if we really want the data to be more useful, should we then reconsider adding also the source (ie: non-moco) data with all the volumes? One possibility would be to upload all volumes under sub-XX and the moco + moco-mean under derivatives.

Useful for us (for development of other things), for others, or both? I can ask the authors of the datasets about this in parallel. I do not think we should wait for this though, it may delay our output significantly, would you agree?

Yes, I would agree. Alongside that argument, we do not need the moco time series right now, so I suggest we only put the mean moco for now then.

For 1 and 3: What kind of json sidecar would you like to have for derivatives? Could you please be so kind as to show me an example? I have not prepared such sidecar files for my derivatives before - did not know it was necessary!

https://intranet.neuro.polymtl.ca/data/dataset-curation.html#json-sidecars https://intranet.neuro.polymtl.ca/data/dataset-curation.html#id2

For 2: What would you prefer to call it? I kept the original naming as we initially discussed. Happy not to separate derivatives as moco and label and put all of them under individual derivatives/sub-XX folders.

Let's first clarify the previous points, to decide what makes most sense. If we end up not uploading the time series (for now), then maybe we should re-consider putting the moco-mean under the source data, and call it: sub-leipzigR01_task-rest_desc-mocomean_bold.nii.gz.

The derivatives would be called: sub-leipzigR01_task-rest_desc-mocomean_bold_label-SC_seg.nii.gz to follow our conventions

Tagging @nathanmolinier

MerveKaptan commented 9 months ago

Thank you!

i'm having after-thoughts about this 'midpoint' solution. I'd say let's share the whole time series, or only the mean moco, but not a sample of it.

Yes, I would agree. Alongside that argument, we do not need the moco time series right now, so I suggest we only put the mean moco for now then.

Let's first clarify the previous points, to decide what makes most sense. If we end up not uploading the time series (for now), then maybe we should re-consider putting the moco-mean under the source data, and call it: sub-leipzigR01_task-rest_desc-mocomean_bold.nii.gz.

I think not everyone may want to share the whole time series and we do not have them yet, so as agreed, let me put the mean moco image as the source image if you think that is better. It will be more neat and specific to this segmentation project as well. Should we decide to add single-volume segmentations to the manuscript, we can add that data as well! That being said... sub-leipzigR01_task-rest_desc-mocomean_bold.nii.gz --> Although I like this naming, I do not think this is BIDS compatible naming for the source data folder. There is no 'desc' field if I am not mistaken.

So, I do not know what the alternative would be. We could use the -acq field to describe mocomean.

jcohenadad commented 9 months ago

rec would be the ideal field:

sub-leipzigR01_task-rest_rec-MoCoMean_bold.nii.gz

MerveKaptan commented 9 months ago

Hmm, I do encounter an error that I have encountered before and I am not sure how I solved it - maybe I did not...

jcohenadad commented 9 months ago

ah... that's annoying. Can you please dig a little in the BIDS specs to see if there is a workaround, and also post this issue on neurostars and ask what they suggest? thanks

MerveKaptan commented 9 months ago

Yes, I will do that and keep you posted!

MerveKaptan commented 9 months ago

Hi @jcohenadad,

I was able to figure this out thanks to help from openneuro team. The trick is to edit the header and make sure it is 4D which can be easily done like this (just adding this -may be helpful for future reference). I will be organizing data. I also need to edit the json files so that it will take a bit of time, but I will try to do it asap :)

rohanbanerjee commented 6 months ago

Tracking requirements:

[ ] json sidecars for BIDS compatibility from NIFTI headers Related discussion: https://neurostars.org/t/json-creation-from-nifti-file/23399 Sample required fields below:
```
{
"TaskName": "N Back",
"TR": ""
}
```
cc: @MerveKaptan

jcohenadad commented 6 months ago

@MerveKaptan any update on this? the earlier the data are on OpenNeuro, the better it will be for reproducing the training that @rohanbanerjee is currently doing-- currently it makes it very hard, non-transparent, prone-to-error to reproduce the model training pipeline-- so this is quite urgent--

MerveKaptan commented 6 months ago

@MerveKaptan any update on this? the earlier the data are on OpenNeuro, the better it will be for reproducing the training that @rohanbanerjee is currently doing-- currently it makes it very hard, non-transparent, prone-to-error to reproduce the model training pipeline-- so this is quite urgent--

Hi Julien,

Thanks a lot! Yes, completely understand! Working on it :) Most of the data is organized! It is just small things which need to be done for BIDS compatibility. We will have a meeting with Rohan today, and will keep you updated!

Merve

MerveKaptan commented 5 months ago

Dear @jcohenadad and @rohanbanerjee ,

Finally, the initial version of our dataset is on openneuro (except Barry's data)! You both have admin access!

jcohenadad commented 5 months ago

@MerveKaptan great! pls add a link to openneuro

rohanbanerjee commented 5 months ago

This is the link to the dataset: https://openneuro.org/datasets/ds005143

jcohenadad commented 5 months ago

You shared it with the wrong email address-- here is my account info:

my ORCID is: 0000-0003-3662-9532

MerveKaptan commented 5 months ago

Hello!

@jcohenadad I need your email to be able to share it with you and when I use the associated email, I get the following error:

Let me write a ticket to OpenNeuro team!

jcohenadad commented 5 months ago

Let me write a ticket to OpenNeuro team!

yup, that's exactly the thing to do.

in parallel, also try with my orcid 0000-0003-3662-9532

MerveKaptan commented 5 months ago

in parallel, also try with my orcid 0000-0003-3662-9532

Thank you! Unfortunately, they are asking for an email but I will keep you posted as soon as I hear from the openneuro team!

rohanbanerjee commented 4 months ago

The data has been successfully uploaded to Openneuro and also been made public. Closing the issue therefore.

jcohenadad commented 4 months ago

For cross-referencing, here is the dataset URL pointing to the published version: https://openneuro.org/datasets/ds005143/versions/1.2.0

sct-pipeline / fmri-segmentation

Upload dataset on OpenNeuro #30