Open yjmantilla opened 3 years ago
Yes, that makes sense. I agree that it this module is actually the user :p
Agreed.
Just an idea for an enhancement: Would it be useful for a "Discover module" to apply a bunch of available bids rulesets to a dataset. And provided a result table recommending the one with the closest match as the starting template.
Would it be useful for a "Discover module" to apply a bunch of available bids rulesets to a dataset. And provided a result table recommending the one with the closest match as the starting template.
@aswinnarayanan That is actually what I had in mind, just that I wasn't sure how to actually develop such a thing.
The most difficult and important thing one has to infer is the input structure. In bidscoin it is fixed to a set of acceptable structures, whereas we leave that to the user so as to be more flexible. In this case one would need to make a set of structure guesses, the problem being that path structure has too many possibilities. Anyhow, one could do it in a combinatorial way with itertools or something.
One heuristic I thought for example is assuming channels named "EOG" "HEOG" "HVEOG" and variations from these are by default EOG channels. This heuristic would be quite simple, it would translate to a "retype the matching channels to EOG" in the rules file.
I think that for example having heuristics like the EOG channel one I gave doesn't justify the "DISCOVER RULES" module because the user still needs to input the path structure to be able to infer the bidspaths. So in a sense it wouldnt be useful.
Now, if we manage to device a way to do the path inference heuristic I think the DISCOVER RULES would be worthwhile. I would finally almost automate the conversion with only minimal user intervention.
@aswinnarayanan Do you have any ideas or comments regarding these thoughts?
@yjmantilla that's a very fair point. The channel name recommendation wouldn't justify a complete module. And the path inference being the bigger issue. Do you see it viable for the user community submit their edited mappings into some kind of central repository?
Do you see it viable for the user community submit their edited mappings into some kind of central repository?
I think that is possible to do. The important part being that the context where the submitted mapping was applied being explicit so as the user can say "oh, this looks like my case" and tests the mapping to see if any further changes are required.
As far as I remember (correct me if wrong), the idea we ended up considering is having heuristics in the sense that we learn the rules from a single example of raw EEG files converted into BIDS. This has several advantages: 1) Saving time to the user, as some of rules would be inferred from the example converted 2) Helping the user understand the format of the rules file. What is better from seeing some of the rules for your actual data? I envisage our tool to really be self-explanatory, without users needing to read a manual of how to write the rule file. 3) More practical. Most studies start with pilot data that needs to be converted rapidly for testing. Nobody has time to write a rule-file at this point, and conversion is usually done manually. Why not taking advantage of this conversion that is done anyway?
Now, I do not expect we can learn EVERYTHING from a simple example, but learning where data files go to in the folder hierarchy or learning which channels are classified as EOG sounds to me quite feasible. I do not expect it to be super clever at this point (so python scripts like heudiconv use are acceptable), but I do hope we can demonstrate that learning from an example has potential for BIDS conversion.
@yjmantilla Do you think it's feasible?
@aswinnarayanan Regarding central repository, I'm all for it, and it was actually one of my original proposals for the project. Having examples for each EEG instrument will be super useful, and even if we cannot base our heuristics on them, we can indeed try these mapping on the input dataset. Even just referring the user to such an online resource will be useful. Only question is if we can fit it into the timeframe of the current project -- @yjmantilla, do you think Bryan can work on this, or will he be investing his time in a rudimentary GUI in the end?
@civier
the idea we ended up considering is having heuristics in the sense that we learn the rules from a single example of raw EEG files converted into BIDS.
It was one of the possible ideas for heuristics, along with developing heuristics for common metadata formats.
Saving time to the user, as some of rules would be inferred from the example converted
I think this is debatable. Some bids files are harder to write than the rules file, while some others arent. In example the sidecar eeg json and the channels.tsv file wouldn't be trivial to fill, or at least they would be cumbersome. The user would need to write the jsons with all the fields having the correct names, meaning they have to see the bids documentation and type that stuff. For the tsv they would need to know what a tsv is for starters and filling it (i imagine with some spreadsheet software).
One thing I do see that can be done by the user easily is placing files in sub-folders with the correct bids name. This would infer the path pattern and the user wouldn't need to write that rule (which is the most important one).
What is better from seeing some of the rules for your actual data? I envisage our tool to really be self-explanatory, without users needing to read a manual of how to write the rule file.
I agree with this being the ideal. Nevertheless almost every work-flow will require an explanation, even the one doing an initial conversion and them passing that to infers the rules. If the rules file is not complete with what is inferred from the example (which is the most probable case), then the user still needs to understand the rules file to complete it.
The main way I see to get rid of explanations is to use a form. Maybe something like ARTEMIS/COBIDAS would be the one to follow. And such form would need show a preview of the conversion so that the user knows what is wrong and changes it.
Most studies start with pilot data that needs to be converted rapidly for testing.
Im not sure if this is true at a general level. In our lab people doing the pilot study collect data following their own scheme and dont convert them to bids until it is required by someone else. Of course this is because we don't have a mature bids culture. The main people thinking of data collection are the ones inclined to that field.
Maybe is a matter of institutional guidelines and workflows. Here eeg data is taken by nurses, medical students, psychology students, engineering students and so on and they do that following a path pattern and thats it. They dont write sidecars files. At most they fill some form with the current session information.
So here the workflow is to convert to bids once the data is fully collected. Test are done on the original source data, not on bids data which is a bad practice but is what we have managed to do given the constraints.
Now, I do not expect we can learn EVERYTHING from a simple example, but learning where data files go to in the folder hierarchy or learning which channels are classified as EOG sounds to me quite feasible. I do not expect it to be super clever at this point (so python scripts like heudiconv use are acceptable), but I do hope we can demonstrate that learning from an example has potential for BIDS conversion.
So what I take from this is that a prototype for the heuristics module is to be able to infer the path pattern following the placing the user did of a source file into a bids directory. That is feasible.
The channel classification is feasible from our side. From the user side though I don't imagine anyone filling the channels.tsv by hand. But if they did, we could infer the classification.
One way I do visualize this kind of example-based workflow working without too much effort into doing bids conversion by hand is that the user places the file where it belongs so that we infer the path pattern. Once that is done, we do the conversion with only that rule. The user inspects the conversion an changes what is needed accordingly. That way he wouldn't need to type everything by hand from zero, but rather just correct what is wrong. Nevertheless such workflow would need to be explained still. Here is the workflow diagram:
So to wrap this up, I will do a prototype for these heuristics:
do you think Bryan can work on this, or will he be investing his time in a rudimentary GUI in the end?
He had advanced a bit with the rudimentary GUI, nevertheless I can say to him to focus his efforts on this feature rather than the GUI if you think it is more important.
PD: I just splitted the comment before this one for easier reading.
So in the proposed design there was the "Discover Rules Module" whose purpose was to "obtain the Rules File" by applying different heuristics to the source-files available.
Main Point
The heuristics are actually the rules themselves, there isn't an easy automated way to obtain them.
Do we really need a "Discover Rules" module? In essence this module applies heuristics to infer heuristics. Which seems redundant. If anything, we would have different "Rules File" templates that suit the needs of different labs/studies, being a human the one that selects which rules template to apply (that is, the human selects which set of heuristics are applied).
My proposal is to get rid of this module.
Advantages:
Explanation
The idea to have a "heuristics" module came from a shallow review of both bidscoin and heudiconv.
Now that I have gotten more into the project itself I'm noting the following:
Here is a comparison on how the heuristics work in these three converters:
So what I called "rules" initially , was what those softwares called "heuristics".
I would appreciate hearing what you guys think of this @stebo85 @aswinnarayanan @civier @DavidjWhite33 @TomEmotion
PD: Both bidscoin's and heudiconv's heuristics work from the pov of mapping input files to output files, ie "this goes here, and that goes overthere". For example, I have not found something similar to the idea of "correcting bad channel types".