mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.
https://mne.tools/mne-bids-pipeline/
BSD 3-Clause "New" or "Revised" License
134 stars 65 forks source link

ENH: Add "run up to this step" option #858

Open larsoner opened 5 months ago

larsoner commented 5 months ago
          > So would would have been a viable solution for @SophieHerbst now? Simply re-run the entire `preprocessing` pipeline and rely on caching to skip e.g. filtering etc?

We don't have a "please run everything up to this step" functionality right now, do we? Do you think we could implement this somehow?

Originally posted by @hoechenberger in https://github.com/mne-tools/mne-bids-pipeline/issues/857#issuecomment-1961329040

Might be a good idea to suggest that in general people prefer this to the --steps option, which is config-change-unsafe.

hoechenberger commented 5 months ago

Since we do have caching now, I would even go so far as to say that we could replace the existing --steps behavior with the new one – if we can guarantee that caching actually works well, including on an NFS setup

That way, we'd also avoid having to construct a dependency tree for each step (in order to ensure that all required input has been generated)

We could actually deprecate --steps in favor of ... --run-until? or something? Because supplying multiple step names wouldn't make sense anymore

SophieHerbst commented 5 months ago

I find the --run-until clearer than --step, with respect to what is actually done. The doc should than make clear that only steps for which a config change occurred will be rerun. Is it a too complex to mark for each config option which step it refers to?

hoechenberger commented 5 months ago

Is it a too complex to mark for each config option which step it refers to?

No, i think this could be generated automatically these days

SophieHerbst commented 5 months ago

Is it normal that the preprocessing pipeline is run again after an update of the pipeline? pip install -U --no-deps git+https://github.com/mne-tools/mne-bids-pipeline@main I changed nothing in my config. This is a bit problematic, because everytime I manage to free some time to work on this, a good part of it is taken up by just waiting for previous steps to recompute.

┌────────┬ init/_01_init_derivatives_dir ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│10:56:07│ ✅ Output directories already exist …
└────────┴ done (20s)
┌────────┬ init/_02_find_empty_room ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│10:56:27│ ✅ sub-155 run-01 Computation unnecessary (cached) …
└────────┴ done (5s)
┌────────┬ preprocessing/_01_data_quality ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│10:56:32│ ✅ sub-155 run-01 Computation unnecessary (cached) …
│10:56:32│ ✅ sub-155 run-02 Computation unnecessary (cached) …
│10:56:32│ ✅ sub-155 run-03 Computation unnecessary (cached) …
│10:56:32│ ✅ sub-155 run-04 Computation unnecessary (cached) …
│10:56:32│ ✅ sub-155 run-05 Computation unnecessary (cached) …
│10:56:32│ ✅ sub-155 run-06 Computation unnecessary (cached) …
│10:56:32│ ✅ sub-155 run-07 Computation unnecessary (cached) …
│10:56:33│ ✅ sub-155 run-08 Computation unnecessary (cached) …
│10:56:33│ ✅ sub-155 run-rest Computation unnecessary (cached) …
│10:56:33│ ✅ sub-155 run-noise Computation unnecessary (cached) …
└────────┴ done (2s)
┌────────┬ preprocessing/_02_head_pos ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│10:56:33│ ⏩ Skipping …
└────────┴ done (1s)
┌────────┬ preprocessing/_03_maxfilter ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│10:56:46│ ⏳️ sub-155 run-01 Loading reference run: 01.
│10:57:55│ ⏳️ sub-155 run-01 Applying SSS to experimental data
│10:57:59│ ⏳️ sub-155 run-01 Destination is 0.0 mm and 0.0° from the original head position
│10:58:06│ ⏳️ sub-155 run-01 Writing sub-155/meg/sub-155_task-tiwm_run-01_proc-sss_raw.fif
│10:59:17│ ⏳️ sub-155 run-01 Adding Maxwell filtered raw data to report.
│10:59:30│ ⏳️ sub-155 run-01 Adding config and sys info to report
│11:00:09│ ⏳️ sub-155 run-01 Saving report: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/sub-155/meg/sub-155_task-tiwm_report.html
│11:00:14│ ✅ sub-155 run-01 Computation unnecessary (cached) …
│11:00:21│ ⏳️ sub-155 run-02 Loading reference run: 01.
│11:00:21│ ⏳️ sub-155 run-02 Applying SSS to experimental data
│11:01:28│ ⏳️ sub-155 run-02 Destination is 2.3 mm and 2.5° from the original head position
│11:01:33│ ⏳️ sub-155 run-02 Writing sub-155/meg/sub-155_task-tiwm_run-02_proc-sss_raw.fif
│11:02:41│ ⏳️ sub-155 run-02 Adding Maxwell filtered raw data to report.
│11:02:53│ ⏳️ sub-155 run-02 Saving report: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/sub-155/meg/sub-155_task-tiwm_report.html
│11:03:04│ ⏳️ sub-155 run-03 Loading reference run: 01.
│11:03:04│ ⏳️ sub-155 run-03 Applying SSS to experimental data
^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[B^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A│11:04:13│ ⏳️ sub-155 run-03 Destination is 5.3 mm and 2.1° from the original head position
│11:04:18│ ⏳️ sub-155 run-03 Writing sub-155/meg/sub-155_task-tiwm_run-03_proc-sss_raw.fif
│11:05:27│ ⏳️ sub-155 run-03 Adding Maxwell filtered raw data to report.
│11:05:38│ ⏳️ sub-155 run-03 Saving report: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/sub-155/meg/sub-155_task-tiwm_report.html
│11:05:49│ ⏳️ sub-155 run-04 Loading reference run: 01.
│11:05:50│ ⏳️ sub-155 run-04 Applying SSS to experimental data
larsoner commented 5 months ago

Yes it's normal -- if the code of a step changes the caching function detects it and says the step should be rerun. This should be why the data quality step was skipped (we didn't change that code lately) but MF step ran (we've fixed bugs there in the last month or so). It would be dangerous / a bug if it didn't behave this way. So when you update any steps that have changed compared to your old version will need to rerun whichever steps have been modified.

SophieHerbst commented 5 months ago

@larsoner I don't find the quote but you asked how long a rerun takes if everything is cached: 1m 55s for one subject

JD-Zhu commented 4 months ago

I think I have a related request - would it be possible to only run steps from a certain point onwards? One example scenario where this would be useful is if you obtain the intermediate processing files from a collaborator and would like to run the remaining steps.

Please let me know if this should be a separate issue instead.