mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.
https://mne.tools/mne-bids-pipeline/
BSD 3-Clause "New" or "Revised" License
139 stars 66 forks source link

run-noise Output file hash mismatch for _task-noise_scores.json #876

Closed SophieHerbst closed 6 months ago

SophieHerbst commented 7 months ago

I just finished a complete pipeline run (1.6), and now wanted to improve ica_cleaning. The only parameter I changed is ica_ctps_ecg_threshold, so I did not expect any steps before that being rerun, but I receive:

│11:31:44│ 🚫 sub-215 run-noise Output file hash mismatch for /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/sub-emptyroom/ses-19230318/meg/sub-emptyroom_ses-19230318_task-noise_scores.json, will recompute …

This takes a lot more time and happens for every participant.

I never observed this behavior before. In the new complete run of the pipeline, I started using find_flat_channels_meg = True find_noisy_channels_meg = True Do these modify the empty room information in a later step, which triggers the re-run?

Happy about any insights on whether it is possible to avoid the re-run.

hoechenberger commented 7 months ago

In the new complete run of the pipeline, I started using find_flat_channels_meg = True find_noisy_channels_meg = True Do these modify the empty room information in a later step, which triggers the re-run?

Without looking at the documentation or code, I would say yes, because this can change the information about which channels are to be marked as bad before running Maxwell-filter… and we try to keep the bad channels in sync between experimental runs and empty-room recordings

SophieHerbst commented 7 months ago

Hm ok. So no way to prevent the lengthy recomputation?

hoechenberger commented 7 months ago

Ah wait. You first finished a complete pipeline run, then adjusted the ECG threshold, and when you re-run now, some earlier step is being re-run? Which one is that, preprocessing/_01_data_quality? That should not happen, no. And it only appears for the empty-room recording??

SophieHerbst commented 7 months ago

yep, it happens in preprocessing/_01_data_quality

SophieHerbst commented 7 months ago

and only for empty room, yes also, it happens only the first time, when I re-rerun it, it does not happen anymore

hoechenberger commented 7 months ago

this shouldn't happen… I don't have time to reproduce or look into this now, though, sorry

SophieHerbst commented 7 months ago

No problem, I just wait for it to be finished once, I wouldn't want to use the development version anyways for this project. But it would be good to fix it in the future.

larsoner commented 7 months ago

Can you upload one subject's raw bids data plus your config.py? I can look

larsoner commented 6 months ago

@SophieHerbst given this is an issue with the empty-room data can you upload sub-emptyroom/ses-19230318 (not the derivatives one but the bids_root / original one)?

larsoner commented 6 months ago

Okay I think I see how this can happen. If two subjects A and B match to the same empty room recording you can run the bad channel finding for that file twice, first for A then for B (assuming n_jobs=1). Then when you re-run the pipeline, a problem will be detected with the output file modified time, because both A and B will have written e.g. :

$ ls -l ~/mne_data/derivatives/mne-bids-pipeline/ds000117/sub-emptyroom/ses-20090409/meg/
total 176
-rw-rw-r-- 1 larsoner larsoner     12 Mar 14 15:15 sub-emptyroom_ses-20090409_task-noise_bads.tsv
-rw-rw-r-- 1 larsoner larsoner 174558 Mar 14 15:15 sub-emptyroom_ses-20090409_task-noise_scores.json

Although it will cause redundant calculations, the cleanest solution here is probably to save the _bads.tsv in subject A and B's derivatives folders separately. This is what ends up happening in the maxwell filter step anyway, since it can use different sets of bads for the two subjects.

SophieHerbst commented 6 months ago

sorry, I was completely offline for some days. will try the fixes now!