mne-tools / mne-bids-pipeline

Automatically process entire electrophysiological datasets using MNE-Python.
https://mne.tools/mne-bids-pipeline/
BSD 3-Clause "New" or "Revised" License
134 stars 65 forks source link

Critical error: Missing file: proc-ica_split-01_epo.fif #852

Closed SophieHerbst closed 5 months ago

SophieHerbst commented 5 months ago

I reran the pipeline after adjusting ICA parameters. Now I am stuck with this error: (I am on the latest main fetched yesterday)

┌────────┬ preprocessing/_08a_apply_ica ────────────────────────────────────────
│11:10:57│ ⏳️ sub-155 Input: sub-155_task-tiwm_split-01_epo.fif
│11:10:57│ ⏳️ sub-155 Output: sub-155_task-tiwm_proc-ica_split-01_epo.fif
│11:11:31│ ⏳️ sub-155 Rejecting ICs: 3, 4, 11, 21, 25, 29, 38, 64, 67, 72
│11:11:51│ ⏳️ sub-155 Saving reconstructed epochs after ICA.
│11:12:26│ ❌ sub-155 A critical error occurred. The error message was: Missing file: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/mne-bids-pipeline/sub-155/meg/sub-155_task-tiwm_proc-ica_split-01_epo.fif

Aborting pipeline run. The traceback is:

  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 329, in _update_for_splits
    assert not missing, f"Missing file: {bids_path.fpath}"
SophieHerbst commented 5 months ago

There is a file named : sub-155_task-tiwm_split-01_epo.fif But not sub-155_task-tiwm_proc-ica_split-01_epo.fif, which is the output file name.

hoechenberger commented 5 months ago

There is a file named : sub-155_task-tiwm_split-01_epo.fif But not sub-155_task-tiwm_proc-ica_split-01_epo.fif, which is the output file name.

This seems odd; my understanding is that the step above should only try to create this file, not read it…

SophieHerbst commented 5 months ago

I agree. But it calls _update_for_splits, which asserts the file to be there:

    if not allow_missing:
        assert not missing, f"Missing file: {bids_path.fpath}"

I tried commenting out those lines, but I still get an error. Is there something wrong with my filenames? I reran the whole pipeline last night and did not change any filenames manually.

This is the error after commenting out the lines above:

│12:09:51│ ❌ sub-155 A critical error occurred. The error message was: missing out_files["epochs"] = /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/mne-bids-pipeline/sub-155/meg/sub-155_task-tiwm_proc-ica_split-01_epo.fif

Aborting pipeline run. The traceback is:

  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 54, in __mne_bids_pipeline_failsafe_wrapper__
    out = memory.cache(func)(*args, **kwargs)
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 272, in wrapper
    memorized_func(*args, **kwargs)
  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 594, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 537, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 779, in call
    output = self.func(*args, **kwargs)
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/steps/preprocessing/_08a_apply_ica.py", line 206, in apply_ica_epochs
    return _prep_out_files(exec_params=exec_params, out_files=out_files)
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files
    out_files = _path_to_str_hash(
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 412, in _path_to_str_hash
    assert v.exists(), f'missing {kind}_files["{k}"] = {v}'
SophieHerbst commented 5 months ago

Correction, the second error was due to my local installation. It works without the 'assert', but I am not sure if it is correct, as the output only mentions one of the two split files. Is the second one loaded automatically? [NB: admidst the changes I made in the config was lengthening the epochs, which likely lead to the larger files and the necessecity to split them, which was not the case before]

┌────────┬ preprocessing/_08a_apply_ica ────────────────────────────────────────
│14:23:07│ ⏳️ sub-215 Input: sub-215_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-155 Input: sub-155_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-215 Output: sub-215_task-tiwm_proc-ica_split-01_epo.fif
│14:23:07│ ⏳️ sub-155 Output: sub-155_task-tiwm_proc-ica_split-01_epo.fif
│14:23:07│ ⏳️ sub-197 Input: sub-197_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-197 Output: sub-197_task-tiwm_proc-ica_split-01_epo.fif
│14:23:07│ ⏳️ sub-220 Input: sub-220_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-220 Output: sub-220_task-tiwm_proc-ica_split-01_epo.fif
hoechenberger commented 5 months ago

Hey, this somehow doesn't seem right. Could you please check the file size of the split files?

ls -lh sub-*/**/sub-*_task-tiwm_proc-ica_split-*_epo.fif

Are they actually 2 GB in size? Do we potentially write multiple split files but simply fail to log that we're doing that?

SophieHerbst commented 5 months ago

The size looks ok, the two files are 1.5GB each.

hoechenberger commented 5 months ago

Do you only have split-01 files? Or also the corresponding split-02?

SophieHerbst commented 5 months ago

01 and 02

hoechenberger commented 5 months ago

So then it's just a logging error, I assume.. no? The next steps of the pipeline are working for you now?

SophieHerbst commented 5 months ago

everything before 08_apply_ICA seems to work (it ran), but I cannot apply the ICA

SophieHerbst commented 5 months ago

I am getting a similar error again, it checks for the files that are supposed to be the output:

File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files
    out_files = _path_to_str_hash(
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 412, in _path_to_str_hash
    assert v.exists(), f'missing {kind}_files["{k}"] = {v}'
hoechenberger commented 5 months ago

But you're saying that the file the pipeline complains about as being missing is, in fact, not missing now?

hoechenberger commented 5 months ago

I am getting a similar error again, it checks for the files that are supposed to be the output:


File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files

    out_files = _path_to_str_hash(

  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 412, in _path_to_str_hash

    assert v.exists(), f'missing {kind}_files["{k}"] = {v}'

Argh ... I'm sure @larsoner will have an idea here

larsoner commented 5 months ago

This is a bit mysterious to me because it looks like we do the same stuff in all three of these places involving _update_for_splits and _prep_out_files:

It is too harsh to try deleting the derivatives directory and starting a run from scratch using latest main?

If it still fails with that (or it's too much work) would you be able to share files with me again?

SophieHerbst commented 5 months ago

@larsoner I rerun yesterday, with latest main, I can try running it on a faster machine to save time

larsoner commented 5 months ago

Did you delete the derivatives/mne-bids-pipeline directory (including the mne-bids-pipeline/_cache folder)? It would eliminate the possibility of cruft causing a problem (which would still be a bug, but would at least un-block you potentially).

larsoner commented 5 months ago

and you can also for example set subjects = ['01'], hopefully this would run in minutes not hours!

SophieHerbst commented 5 months ago

still running.. for 1 SB with 8 runs - the epochs are very long though. I will report how it went tomorrow

SophieHerbst commented 5 months ago

Same same

┌────────┬ preprocessing/_08a_apply_ica ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│17:48:31│ ⏳️ sub-155 Input: sub-155_task-tiwm_split-01_epo.fif
│17:48:31│ ⏳️ sub-155 Output: sub-155_task-tiwm_proc-ica_split-01_epo.fif
│17:48:40│ ⏳️ sub-155 Rejecting ICs: 3, 4, 11, 21, 25, 29, 38, 64, 67, 72
│17:49:07│ ⏳️ sub-155 Saving reconstructed epochs after ICA.
│17:56:17│ ⏳️ sub-155 Adding ICA to report.
│17:59:12│ ⏳️ sub-155 Saving report: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/mne-bids-pipeline/sub-155/meg/sub-155_task-tiwm_report.html
│17:59:17│ ❌ sub-155 A critical error occurred. The error message was: [Errno 2] No such file or directory: '/neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/mne-bids-pipeline/sub-155/meg/sub-155_task-tiwm_proc-ica_split-01_epo.fif'

Aborting pipeline run. The traceback is:

  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 54, in __mne_bids_pipeline_failsafe_wrapper__
    out = memory.cache(func)(*args, **kwargs)
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 272, in wrapper
    memorized_func(*args, **kwargs)
  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 594, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 537, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 779, in call
    output = self.func(*args, **kwargs)
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/steps/preprocessing/_08a_apply_ica.py", line 206, in apply_ica_epochs
    return _prep_out_files(exec_params=exec_params, out_files=out_files)
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files
    out_files = _path_to_str_hash(
  File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 414, in _path_to_str_hash
    this_hash = v.stat().st_mtime
  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/pathlib.py", line 1097, in stat
    return self._accessor.stat(self, follow_symlinks=follow_symlinks)
larsoner commented 5 months ago

Modified the uploaded script to have:

bids_root = Path(__file__).parent
deriv_root = bids_root / "derivatives"

and re-ran. Observations unrelated to the issue that I figured you should know:

I was unable to reproduce the error. Can you try latest main of MNE-Python, MNE-BIDS, and MNE-BIDS-Pipeline? Here's my sys_info for those for reference:

├☑ mne               1.7.0.dev120+g38e2839a36 (devel, latest release is 1.6.1)
├☑ mne-bids          0.15.0.dev31+gb85a0dc7.d20240216
├☑ mne-bids-pipeline 1.6.0.dev18+g9f8b170
hoechenberger commented 5 months ago

I have a pretty good machine but I'm surprised it takes so long on yours

If things are still the way I remember them, the data is stored on an NFS that is super slow esp during the initialization step(s)

SophieHerbst commented 5 months ago

thanks @larsoner, I will try again with the latest mains. and thanks for checking the config!

SophieHerbst commented 5 months ago

@larsoner, unfortunately, I still get the error. I adopted all your suggestions for the config and my sys-info is below. Any advice on what to do?

┌────────┬ preprocessing/_08a_apply_ica ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│17:34:17│ ⏳️ sub-155 Input: sub-155_task-tiwm_split-01_epo.fif
│17:34:17│ ⏳️ sub-155 Output: sub-155_task-tiwm_proc-ica_split-01_epo.fif
│17:34:25│ ⏳️ sub-155 Rejecting ICs: 3, 5, 9, 12, 15, 21, 31, 33, 41, 43, 53, 57, 63, 71
│17:34:40│ ⏳️ sub-155 Saving reconstructed epochs after ICA.
│17:35:09│ ❌ sub-155 A critical error occurred. The error message was: Missing file: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/sub-155/meg/sub-155_task-tiwm_proc-ica_split-01_epo.fif

Aborting pipeline run. The traceback is:

  File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/mne_bids_pipeline/_run.py", line 329, in _update_for_splits
    assert not missing, f"Missing file: {bids_path.fpath}"
Platform             Linux-6.5.0-17-generic-x86_64-with-glibc2.35
Python               3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
Executable           /home/sh254795/anaconda3/envs/mne/bin/python
CPU                  x86_64 (48 cores)
Memory               125.8 GB

Core
├☑ mne               1.7.0.dev120+g614424718 (devel, latest release is 1.6.1)
├☑ numpy             1.23.5 (OpenBLAS 0.3.21 with 48 threads)
├☑ scipy             1.10.1
└☑ matplotlib        3.7.1 (backend=agg)

Numerical (optional)
├☑ sklearn           1.2.2
├☑ numba             0.56.4
├☑ nibabel           5.1.0
├☑ nilearn           0.10.0
├☑ dipy              1.6.0
├☑ openmeeg          2.5.5
├☑ pandas            2.0.0
└☐ unavailable       cupy

Visualization (optional)
├☑ pyvista           0.38.5 (OpenGL 4.5 (Core Profile) Mesa 22.3.2 via llvmpipe (LLVM 15.0.6, 256 bits))
├☑ pyvistaqt         0.0.0
├☑ vtk               9.3.0
├☑ qtpy              2.3.1 (None=None)
├☑ pyqtgraph         0.13.3
├☑ mne-qt-browser    0.0.0
├☑ ipywidgets        8.0.6
├☑ trame_client      2.13.0
├☑ trame_server      2.12.1
└☐ unavailable       ipympl, trame_vtk, trame_vuetify

Ecosystem (optional)
├☑ mne-bids          0.15.0.dev32+g4fd0ba0
├☑ mne-bids-pipeline 1.6.0.dev19+g5d4e3c3
└☐ unavailable       mne-nirs, mne-features, mne-connectivity, mne-icalabel, neo
SophieHerbst commented 5 months ago

@larsoner I just saw that before the pipeline breaks, these two files were written, i.e. the filenames are wrong. I must have missed those before even though I looked through the folder, maybe because I ran several subjects in parallel (at least this seems a more likely explanation, than their magical appearance after running the same thing n times). I also ran it on a different machine, but with access to the same server for the data, and the same mne setup.

sub-155_task-tiwm_proc-ica_split-01_split-01_epo.fif sub-155_task-tiwm_proc-ica_split-01_split-02_epo.fif

Any idea how to prevent these erroneous filenames?

hoechenberger commented 5 months ago

Oooohhhh good catch!

..... stupid split files ....... We should probably have better handling for them in MNE-BIDS

Until then, we can probably work around this problem somehow

SophieHerbst commented 5 months ago

@hoechenberger can you suggest a workaround? I am stuck and a student is waiting to get the cleaned epoch files.

hoechenberger commented 5 months ago

I don't know a quick workaround from the top of my head, sorry. I hope @larsoner may find some time to look into this, as I'm rather busy for the entire week. One temporary "solution" that comes to mind is making the epochs shorter such that you don't end up with split files. I know this is not what you eventually will want to use, but that way you could at least produce a "preview dataset" your student can use to start implementing their analysis, until the "actual" dataset is ready