psychoinformatics-de / datalad-hirni

DataLad extension for (semi-)automated, reproducible processing of (medical/neuro)imaging data
http://datalad.org
Other
5 stars 8 forks source link

running spec2bids without saved changes in sourcedata should throw a warning #156

Open pvavra opened 4 years ago

pvavra commented 4 years ago

disclaimer: I'm not sure this is actually a bug or just a misunderstanding of the workflow by me.

my situation:

Calling datalad hirni-spec2bids sourcedata/sub-001/studyspec.json from within the bids-folder, the behavior conversion doesn't work and throws an error.

After some debugging, I manage to fix the error and re-run the above call. The conversion procedure also saves the newly created *_events.tsv files - and only those, by calling datalad save .. sub-001/*_events.tsv

And this last step is what I'm wondering about: on the one hand, using save with the PATH arguments makes sense, as this would enables me to call this conversion in parallel. However, from what I understand, the git-log now contains a reference to files (events.tsv) which are "out of sync" with the record of the sourcedata checksum, because in the above workflow I didn't save the sourcedata state yet (I forgot..).

Not sure what the "best" solution would be, but one that comes to mind is having "hirni-spec2bids" check whether the sourcedata dataset is in modified state and ask the user for confirmation to continue anyway. That is, give a warning in some form.

The goal, I think, should be to have the conversion to bids only work with a saved state of the used scripts/files, and not use things that are changed in the working tree.

@bpoldrack What are your thoughts on this?

bpoldrack commented 4 years ago

disclaimer: I'm not sure this is actually a bug or just a misunderstanding of the workflow by me.

Neither. Behaves as expected to me, so - no bug. And I can't see anything really wrong with your workflow. It simply didn't occur to me that this could be unintended. There's a slight difference in your workflow, that I do consider valid but didn't think of. That is: I expected people to work on their sourcedata repo at an independent location, while sourcedata within bids would be a clone thereof (that would be updated if you changed something and committed it). So, this situation wouldn't happen.

But I think you have a point there! Of course you should be able to work directly within that subdataset. While I wouldn't enforce it since you should have the freedom to do so (for whatever reason), a warning or confirmation makes sense to me. Need to think how to handle a non-interactive execution though.

bpoldrack commented 4 years ago

Just for clarification in case this is needed: As shown in the demo http://docs.datalad.org/projects/hirni/en/latest/demos/conversion.html I'd expect people to have a location for the raw dataset, say - studyXY. And then have the bids dataset at studyXY_BIDS wherein you'd datalad install -d . -s studyXY sourcedata, referencing the raw dataset. After conversion you could uninstall sourcedata. Therefore you can have several converted datasets based on different states of the raw dataset or targeting different layouts (like different BIDS version for example), while having the one place where you'd maintain your raw dataset.

However - your approach is valid.

pvavra commented 4 years ago

I understand the "different location" workflow. The advantage of having such a nested structure is less disk-space usage, I think (without having to add "drops" everywhere..).

concerning the non-interactive execution: I cannot think of a use-case (limited imagination?) where one would like to convert automatically to bids from an unclean state (or at least unsaved files -- the rest of the repo might be irrelevant). But maybe having a --no-check analogous to drop could work?

Thinking about relative frequencies of use-cases, I'm quite confident that "clean sourcedata" will be more common, and should be hence the default..

bpoldrack commented 4 years ago

cannot think of a use-case (limited imagination?) where one would like to convert automatically to bids from an unclean state

Kind of the point. It's a warning after all. I always edit specfiles in a scripted fashion. Then convert. I can make a mistake in such scripts leading to uncommitted changes. Thus running into the same issue.

But maybe having a --no-check analogous to drop could work

Yes. The concern is the other way around: If the check only leads to a warning or an interactive confirmation request, you can't easily react to it non-interactively. For the same "protection" level in scripted usage the command would need to actually fail. That's an option of course: Fail except --no-check is given. However, that might be a bit annoying in the interactive use case.