Make the dataset BIDS compatible

jcohenadad commented 1 year ago

Currently, the naming with "_mean" suffix is not BIDS compatible. I think there are ways to describe processing applied to a time series.

MerveKaptan commented 1 year ago

What about this?

data_leipzig_rest
   ├── dataset_description.json
   ├── derivatives
   │   ├── labels
   │   │   └── sub-leipzigR01
   │   │       └── func
   │   │           └── sub-leipzigR01_task-rest_desc-mocomeanseg.nii.gz  <---- Manual spinal cord segmentation
   │   └── moco
   │       └── sub-leipzigR01
   │           └── func
   │               ├── sub-leipzigR01_task-rest_desc-moco_bold.nii.gz     <---- 20 motion-corrected EPI volumes
   │               └── sub-leipzigR01_task-rest_desc-mocomean_bold.nii.gz <---- Mean of motion-corrected volumes 
   ├── participants.json
   ├── participants.tsv
   └── task-rest_bold.json

Source: https://hackmd.io/@effigies/bids-derivatives-readme

jcohenadad commented 1 year ago

Looks like a good start! To be closer to the official examples, I would do:

labels --> label
sub-leipzigR01_task-rest_desc-mocomeanseg.nii.gz --> sub-leipzigR01_task-rest_desc-spinalcord_mask.nii.gz

MerveKaptan commented 1 year ago

Okay, perfect, thank you- I will change it and reupload the data!

MerveKaptan commented 1 year ago

Dear all, As you know our dataset is consisting of derivatives only. Therefore, OpenNeuro as it is in its current form would not accept it but they are updating their platform and this is the reply I got from them:

Thanks for reaching out. This is quite timely, as we've been working on a validator for derivatives that would allow us to host derivatives-only datasets. The current plan is to roll it out as an option that can be enabled on a per-dataset basis by an admin, and we hope to do that within the next month or so. If you'd like, I can get back in touch when we're ready and we can get it set up for your dataset; this would help us test out our processes for non-admin users. What I would recommend for the data organization would be to treat it as one coherent dataset, and not two parallel datasets:
data_leipzig_rest
├── dataset_description.json
├── sub-leipzigR01
│   └── func
│       ├── sub-leipzigR01_task-rest_desc-moco_bold.json       <---- sidecar json file containing imaging parameters
│       ├── sub-leipzigR01_task-rest_desc-moco_bold.nii.gz     <---- 20 motion-corrected EPI volumes                 
│       ├── sub-leipzigR01_task-rest_desc-mocomean_bold.nii.gz <---- Mean of motion-corrected volumes
│       └── sub-leipzigR01_task-rest_desc-spinalcord_mask.nii.gz  <---- Manual spinal cord segmentation
├── participants.json
├── participants.tsv
└── task-rest_bold.json
In the meantime, you can test out the new validator by using the version published to https://deno.land/x/bids_validator: deno run --allow-read --allow-env https://deno.land/x/bids_validator/bids-validator.ts path/to/dataset We'd be happy to hear any issues you run into. You can open them here: https://github.com/bids-standard/bids-validator/issues Best, Chris

What do you think?

jcohenadad commented 1 year ago

this is the reply I got from them:

could you please cross-ref the GH conversation

jcohenadad commented 1 year ago

As you know our dataset is consisting of derivatives only.

maybe we should consider having the images NOT in the derivatives, but in the common location?

MerveKaptan commented 1 year ago

As you know our dataset is consisting of derivatives only.

maybe we should consider having the images NOT in the derivatives, but in the common location?

this is the reply I got from them:

could you please cross-ref the GH conversation

Hi Julien,

I am so sorry, I do not understand what I should cross-reference.

jcohenadad commented 1 year ago

@rohanbanerjee pls help with https://github.com/sct-pipeline/fmri-segmentation/issues/1#issuecomment-1670116219-- always put hyperlinks so we can go through the various GH repos-- GH is a public comm platform

MerveKaptan commented 1 year ago

As you know our dataset is consisting of derivatives only.

maybe we should consider having the images NOT in the derivatives, but in the common location?

I think we can use the 20 volumes as source data and then mask and mean as derivatives! We would need to change the naming of the data sub-leipzigR01_task-rest_desc-moco_bold.nii.gz a bit to make it compatible to be source data!

alternatively, we can think about sharing data here: https://data.mendeley.com/

jcohenadad commented 1 year ago

I think we can use the 20 volumes as source data and then mask and mean as derivatives!

can we consider using the mean in the source data (even though I know this is not really a source...)

alternatively, we can think about sharing data here: https://data.mendeley.com/

no, let's stick with OpenNeuro

MerveKaptan commented 1 year ago

Good questions!

I think we can use the 20 volumes as source data and then mask and mean as derivatives!

can we consider using the mean in the source data (even though I know this is not really a source...)

Technically, we can! I had shared one-volume EPI data as the source data (as we just acquired one volume for that specific acquisition). Would you want me to ask these questions to OpenNeuro developers and get back to you?

alternatively, we can think about sharing data here: https://data.mendeley.com/

no, let's stick with OpenNeuro

sure, I also think that would be better!

jcohenadad commented 1 year ago

Would you want me to ask these questions to OpenNeuro developers and get back to you?

yes, ask the question, but instead of getting back to me, please cross-ref the conversation here-- again, GH is a public comm platform and we should all be able to see and participate in the discussion thread with openneuro-- if this is unclear @rohanbanerjee please chat with @MerveKaptan to clarify

rohanbanerjee commented 1 year ago

Sure, we will open an issue on OpenNeuro GH and post the link here.

MerveKaptan commented 1 year ago

Sure, we will open an issue on OpenNeuro GH and post the link here.

Thank you both for clarifying! I was not using GH previously, but their ticket system : https://openneuro.freshdesk.com/support/tickets/1525 Now we will move it to GitHub as Rohan suggested!

jcohenadad commented 1 year ago

Thank you both for clarifying! I was not using GH previously, but their ticket system : https://openneuro.freshdesk.com/support/tickets/1525 Now we will move it to GitHub as Rohan suggested!

no-- if they have a ticket system, then you should use their ticket system, or whatever they ask users to use. I was suggesting GH bc i thought they were using GH issues as their user-facing ticket system. On the other hand, I am not able to see your ticket, which is not great... communication alternatives are neurostars.org (or maybe GH issue if they also have that).

Anyhow, at this point it is rather up to us to decide what to do with this dataset and how to convert it to BIDS, right?

MerveKaptan commented 1 year ago

Hi @jcohenadad,

I think both are fine. As your suggestion, I have moved this to GitHub.

Actually, talking to Chris from the OpenNeuro team helped a lot and I finally have an idea how we can organize the data!

What we need to do is to have a source dataset which will be 20 volume time-series and we could basically organize the derivatives however we would like to!

MerveKaptan commented 12 months ago

Hi @jcohenadad,

What do you think? Once we decide on an organization, I can re-organize the data as we want to.

I can move the 20 motion-corrected volumes to source data and keep the derivatives as they are. Alternatively, I can also move both derivatives into one folder (instead of separating them for moco mean and spinal segmentation).

Please do let me know!

jcohenadad commented 12 months ago

the original motivation for having the moco dataset in source is because of https://github.com/sct-pipeline/fmri-segmentation/issues/1#issuecomment-1670041420-- now, if https://github.com/sct-pipeline/fmri-segmentation/issues/1#issuecomment-1670041420 is fixed, then we can put everything under derivatives/ if it is not, then you split between source (moco data) and derivatives (mean and seg)

MerveKaptan commented 11 months ago

Thank you, @jcohenadad ! I missed this reply for some reason. Okay, I will ask the OpenNeuro team again and act accordingly.

MerveKaptan commented 11 months ago

Dear @jcohenadad & @rohanbanerjee,

FYI, I have been testing out a few things, and I believe, it will be easier and quicker to go with the following solution: "split between source (moco data) and derivatives (mean and seg)". I will reorganize the data and start uploading.

That being said, for some sites, we would still need the sidecar .json files as mentioned here .

Thank you! Merve

MerveKaptan commented 11 months ago

Hi @jcohenadad & @rohanbanerjee,

Another question about the BIDS organization. Currently, the latest BIDS version does not have an option for multisite data, please see here.

We can either treat each data set separately or combine all the subjects in one dataset and add a site column to participants.tsv file as suggested in the link above.

I believe the second option will be more neat as it is one project. Would you agree?

Thank you, Merve

jcohenadad commented 11 months ago

Currently, the latest BIDS version does not have an option for multisite data, please see here.

it does, see here and an example for the spine-generic project.

I believe the second option will be more neat as it is one project. Would you agree?

👍

rohanbanerjee commented 2 months ago

The dataset is BIDS compliant now and has been uploaded to Openneuro. Closing the issue.

sct-pipeline / fmri-segmentation

Make the dataset BIDS compatible #1