Datasets with different patterns within themselves and the participant concept

yjmantilla commented 2 years ago

Im copying this from a concern from @civier

SO LET'S SAY THAT FOR A SPECIFIC STUDY, THE USER USED ONE PATTERN (e.g., RSEEG/sub-%entities.subject%.vhdr) FOR HALF OF THE PARTICIPANTS AND ANOTHER PATTERN (e.g., RSEEG/subject#%entities.subject%.vhdr) FOR THE REST OF THEM. HOW CAN SOVABIDS HANDLE IT? CAN I SPECIFY TWO DIFFERENT PATTERNS/REGEXP IN THE RULES FILE? OR, SHOULD I HAVE TWO RULES FILES? WHATEVER IS THE DESIRED APPROACH, PLEASE CLARIFY IT IN THE DOCS.

What follows is my discussion with Oren regarding this topic, so that it does not get lost in a particular email:

Regarding two rules files acting on the same target bids path --right now it won't work because every rule file tries to match all SOURCE files, and if one SOURCE file does not match, it throws an error (correct me if wrong; that is what I observed and reported in the GitHub issue). So enabling two rule files to work afterall (within the incremental conversion framework, for example) might be a good interim solution that is worth pursuing. I would describe something like using two rules files in a separate section of the docs that gives tips on how to handle different situations, e.g. different patterns of naming across dataset, an error in the naming of a specific file, missing data files, etc.As Dave said, converting to BIDS is relatively menagable if data is consistent (people often write short scripts that do it), but people often give up due to all these small issues. Hopefully they would be easier to DETECT and SOLVE using SOVABIDS.

I think acting on the same target bidspath is not the problem but rather acting on the same sourcepath which is scanned to get the files. I think this is what you clarified in the last message. This hints to me that it would be useful to have a way to get the files that match the pattern and acting just on them (and of course informing the user that there are files that didnt match the pattern if that is the case).

Now, whether this is implemented from different rules files that apply to the same sourcepath/dataset or rather a single rules file that handles multiple versions of the same rule is in discussion.

How do you envisage SOVABIDS treating different naming patterns within the data from a SINGLE participant? Would it be identical to how it will treat different naming conventions in DIFFERENT participants?

I would have to meditate this, and maybe the final conclusion will be achieved when we actually do the implementation. Nevertheless my intuition is that it would be identical to how it will treat different naming conventions in DIFFERENT participants as you said.

That actually brings me to the bigger question: does SOVABIDS have any concept of a participant at all? Or is it only seeing a collection of data files? Whatever is the case, it is worth mentioning it in the docs to make clear.

Right now it is only seeing a collection of data files. I would prefer for it to be more general and not depend on the concept of participant, or maybe only slightly (like files having a similar sub-XXXX pattern).

I'm not familiar enough with the EEG BIDS specification, but if there are meta-data that are participant specific (e.g., bio of the participant), rather than data-file specific, we will have to introduce into SOVABIDS the concept of participant in one point or another afterall.

I think @DavidjWhite33 will have to help us on this. I dont think this use-case is common. From what I have seen, usually eeg files are provided without any metadata files apart from what the acquisition software provides.

civier commented 2 years ago

Regarding:

"This hints to me that it would be useful to have a way to get the files that match the pattern and acting just on them (and of course informing the user that there are files that didnt match the pattern if that is the case).

Now, whether this is implemented from different rules files that apply to the same sourcepath/dataset or rather a single rules file that handles multiple versions of the same rule is in discussion."

It is worth taking the BIDScoin example. BIDScoin indeed only converts the files that match a pattern in the rules file, but from what I saw, it does not inform the user of "unused" files. I actually think that informing the user of data files that were not included in the BIDS is important, as it can prevent mistakes in the conversion process.

Also, BIDScoin uses a single rules file that contains multiple patterns. That might or might not be the best approach for us.

I just want to point out that the comparison to the DICOM conversion in BIDScoin has its limitations. In DICOM, each sequence (anat, func, dwi) is matched using a different pattern, so we naturally have many patterns for each study. That might not be the case in EEG/MEG.

Regarding:

"Right now it is only seeing a collection of data files. I would prefer for it to be more general and not depend on the concept of participant, or maybe only slightly (like files having a similar sub-XXXX pattern)."

I'm not sure if BIDScoin has a notion of participant or not, but I do know that it take into account participants in incremental conversion. If a participant already has a subdirectory under the target BIDS folder, and the subdirectory already contain data for at least one sequence type (for example, anat), BIDScoin will not try to convert anything for that participant unless the -f flag is used.

yjmantilla commented 2 years ago

I actually think that informing the user of data files that were not included in the BIDS is important, as it can prevent mistakes in the conversion process.

Agreed

I'm not sure if BIDScoin has a notion of participant or not, but I do know that it take into account participants in incremental conversion.

I think it does from what I remember of my last reading of the code. This can be seen here

mne-bids does have the concept of participant into account as well why is why I think we may not need it. I still need to better study the deal with incremental conversion to be sure.

yjmantilla / sovabids

Datasets with different patterns within themselves and the participant concept #43