mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.
https://mne.tools/mne-bids-pipeline/
BSD 3-Clause "New" or "Revised" License
133 stars 65 forks source link

sensor/_99_group_average not rerun after change in config parameters #901

Closed SophieHerbst closed 3 months ago

SophieHerbst commented 3 months ago

I expected that a change in these parameters would trigger the re-computation of the group statistics , but it did not. sensor/_05_decoding_csp is also not rerun (but I didn't expect it to).

cluster_forming_t_threshold = 0.05
cluster_permutation_p_threshold = 0.05
cluster_n_permutations = 10000
┌────────┬ sensor/_99_group_average ────────────────────────────────────────────────────────────────────────────────────────────
│10:52:07│ ✅ sub-average Computation unnecessary (cached) …
│10:52:08│ ✅ sub-average Computation unnecessary (cached) …
│10:52:08│ ✅ sub-average Computation unnecessary (cached) …
│10:52:08│ ✅ sub-average Computation unnecessary (cached) …
│10:52:13│ ✅ sub-average Computation unnecessary (cached) …
│10:52:13│ ✅ sub-average Computation unnecessary (cached) …
│10:52:16│ ✅ sub-average Computation unnecessary (cached) …
│10:52:16│ ✅ sub-average Computation unnecessary (cached) …
hoechenberger commented 3 months ago

The group averaging should be re-run, CSP should not, yes

So there's a bug somewhere

SophieHerbst commented 3 months ago

@hoechenberger can you give me a hint on where to look for the config parameters that are checked for changes (just tell me which file)

larsoner commented 3 months ago

@SophieHerbst there is some magic that happens with

https://github.com/mne-tools/mne-bids-pipeline/blob/be6850ad68579dc672b3ea9faf5cabc4b8a22668/mne_bids_pipeline/_run.py#L25

To make calls like this magically get checked

https://github.com/mne-tools/mne-bids-pipeline/blob/be6850ad68579dc672b3ea9faf5cabc4b8a22668/mne_bids_pipeline/steps/sensor/_99_group_average.py#L297-L303

Part of the magic is that this failsafe_run checks the out_files returned by the decorated function, hashes them, and adds them to the set of things hashed. But if there are no output files, I suspect:

  1. Calling with parameter set A will cause a re-run
  2. Calling with parameter set B (!=A) will cause a re-run
  3. Calling with parameter set A a second time will not cause a re-run, because the output of the callable will be unchanged

Assuming this is the problem, we could somehow try to invalidate the cache of call A or something, but that would be hard to get right. An easier approach would be to make sure that every step produces a non-empty out_files (keeping in mind that the report is never in that set of files). For something like average_evokeds this already happens, but perhaps for the decoding step it does not. If we simply write out an extra .csv -- or even a trivial hidden file ._99_group_average_run with something that will change based on the cfg parameters (like the parameters used for clustering or something) then the mtime-or-hash check on this file will cause a cache miss for the failsafe_run the second time you run with parameter set A.

@SophieHerbst do you want to try fixing this or prefer I give it a shot? Not sure how involved it will be -- might just be one step that needs fixing, or several. (And I might be wrong about this explanation!)

larsoner commented 3 months ago

... and for this explanation note that the subject reports are (rightly) never considered in the set of out_files to hash, since the reports are updated almost every step.

SophieHerbst commented 3 months ago

Hi @larsoner, thanks for the helpful explanation. I should be in your case 2 as I did change the parameters cluster_forming_t_threshold and cluster_permutation_p_threshold.

  1. Calling with parameter set A will cause a re-run
  2. Calling with parameter set B (!=A) will cause a re-run
  3. Calling with parameter set A a second time will not cause a re-run, because the output of the callable will be unchanged

If you can give it a try, I think it would be more efficient, as this whole magic business is still somewhat obscure to me.

SophieHerbst commented 3 months ago

@larsoner @hoechenberger the rerun-magic is killing me.. (when trying to change things in the report) is there a way to force complete rerun for a certain step, e.g. sensor/99_group_average?

larsoner commented 3 months ago

Yes in principle --no-cache plus --steps specifying that one step should do it for you:

$ mne_bids_pipeline --help
usage: mne_bids_pipeline [-h] [--version] [--config FILE] [--create-config FILE] [--steps STEPS] [--root-dir ROOT_DIR] [--
...
  --steps STEPS         The processing steps to run. Can either be one of the processing groups 'preprocessing', sensor', 
...
  --no-cache            Disable caching of intermediate results.
SophieHerbst commented 3 months ago

There really is a bug with the csp code, even completely changing the parameters does not trigger a rerun. Only deleting the output xlsx files does.

larsoner commented 3 months ago

Agreed I think it's outlined in https://github.com/mne-tools/mne-bids-pipeline/issues/901#issuecomment-2015147155 , I hope to fix it soon