Closed SophieHerbst closed 5 months ago
There is a file named : sub-155_task-tiwm_split-01_epo.fif But not sub-155_task-tiwm_proc-ica_split-01_epo.fif, which is the output file name.
There is a file named : sub-155_task-tiwm_split-01_epo.fif But not sub-155_task-tiwm_proc-ica_split-01_epo.fif, which is the output file name.
This seems odd; my understanding is that the step above should only try to create this file, not read it…
I agree. But it calls _update_for_splits, which asserts the file to be there:
if not allow_missing:
assert not missing, f"Missing file: {bids_path.fpath}"
I tried commenting out those lines, but I still get an error. Is there something wrong with my filenames? I reran the whole pipeline last night and did not change any filenames manually.
This is the error after commenting out the lines above:
│12:09:51│ ❌ sub-155 A critical error occurred. The error message was: missing out_files["epochs"] = /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/mne-bids-pipeline/sub-155/meg/sub-155_task-tiwm_proc-ica_split-01_epo.fif
Aborting pipeline run. The traceback is:
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 54, in __mne_bids_pipeline_failsafe_wrapper__
out = memory.cache(func)(*args, **kwargs)
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 272, in wrapper
memorized_func(*args, **kwargs)
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 594, in __call__
return self._cached_call(args, kwargs)[0]
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 537, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 779, in call
output = self.func(*args, **kwargs)
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/steps/preprocessing/_08a_apply_ica.py", line 206, in apply_ica_epochs
return _prep_out_files(exec_params=exec_params, out_files=out_files)
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files
out_files = _path_to_str_hash(
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 412, in _path_to_str_hash
assert v.exists(), f'missing {kind}_files["{k}"] = {v}'
Correction, the second error was due to my local installation. It works without the 'assert', but I am not sure if it is correct, as the output only mentions one of the two split files. Is the second one loaded automatically? [NB: admidst the changes I made in the config was lengthening the epochs, which likely lead to the larger files and the necessecity to split them, which was not the case before]
┌────────┬ preprocessing/_08a_apply_ica ────────────────────────────────────────
│14:23:07│ ⏳️ sub-215 Input: sub-215_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-155 Input: sub-155_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-215 Output: sub-215_task-tiwm_proc-ica_split-01_epo.fif
│14:23:07│ ⏳️ sub-155 Output: sub-155_task-tiwm_proc-ica_split-01_epo.fif
│14:23:07│ ⏳️ sub-197 Input: sub-197_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-197 Output: sub-197_task-tiwm_proc-ica_split-01_epo.fif
│14:23:07│ ⏳️ sub-220 Input: sub-220_task-tiwm_split-01_epo.fif
│14:23:07│ ⏳️ sub-220 Output: sub-220_task-tiwm_proc-ica_split-01_epo.fif
Hey, this somehow doesn't seem right. Could you please check the file size of the split files?
ls -lh sub-*/**/sub-*_task-tiwm_proc-ica_split-*_epo.fif
Are they actually 2 GB in size? Do we potentially write multiple split files but simply fail to log that we're doing that?
The size looks ok, the two files are 1.5GB each.
Do you only have split-01 files? Or also the corresponding split-02?
01 and 02
So then it's just a logging error, I assume.. no? The next steps of the pipeline are working for you now?
everything before 08_apply_ICA seems to work (it ran), but I cannot apply the ICA
I am getting a similar error again, it checks for the files that are supposed to be the output:
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files
out_files = _path_to_str_hash(
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 412, in _path_to_str_hash
assert v.exists(), f'missing {kind}_files["{k}"] = {v}'
But you're saying that the file the pipeline complains about as being missing is, in fact, not missing now?
I am getting a similar error again, it checks for the files that are supposed to be the output:
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files out_files = _path_to_str_hash( File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 412, in _path_to_str_hash assert v.exists(), f'missing {kind}_files["{k}"] = {v}'
Argh ... I'm sure @larsoner will have an idea here
This is a bit mysterious to me because it looks like we do the same stuff in all three of these places involving _update_for_splits
and _prep_out_files
:
It is too harsh to try deleting the derivatives
directory and starting a run from scratch using latest main
?
If it still fails with that (or it's too much work) would you be able to share files with me again?
@larsoner I rerun yesterday, with latest main, I can try running it on a faster machine to save time
Did you delete the derivatives/mne-bids-pipeline
directory (including the mne-bids-pipeline/_cache
folder)? It would eliminate the possibility of cruft causing a problem (which would still be a bug, but would at least un-block you potentially).
and you can also for example set subjects = ['01']
, hopefully this would run in minutes not hours!
still running.. for 1 SB with 8 runs - the epochs are very long though. I will report how it went tomorrow
Same same
┌────────┬ preprocessing/_08a_apply_ica ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│17:48:31│ ⏳️ sub-155 Input: sub-155_task-tiwm_split-01_epo.fif
│17:48:31│ ⏳️ sub-155 Output: sub-155_task-tiwm_proc-ica_split-01_epo.fif
│17:48:40│ ⏳️ sub-155 Rejecting ICs: 3, 4, 11, 21, 25, 29, 38, 64, 67, 72
│17:49:07│ ⏳️ sub-155 Saving reconstructed epochs after ICA.
│17:56:17│ ⏳️ sub-155 Adding ICA to report.
│17:59:12│ ⏳️ sub-155 Saving report: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/mne-bids-pipeline/sub-155/meg/sub-155_task-tiwm_report.html
│17:59:17│ ❌ sub-155 A critical error occurred. The error message was: [Errno 2] No such file or directory: '/neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/mne-bids-pipeline/sub-155/meg/sub-155_task-tiwm_proc-ica_split-01_epo.fif'
Aborting pipeline run. The traceback is:
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 54, in __mne_bids_pipeline_failsafe_wrapper__
out = memory.cache(func)(*args, **kwargs)
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 272, in wrapper
memorized_func(*args, **kwargs)
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 594, in __call__
return self._cached_call(args, kwargs)[0]
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 537, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/joblib/memory.py", line 779, in call
output = self.func(*args, **kwargs)
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/steps/preprocessing/_08a_apply_ica.py", line 206, in apply_ica_epochs
return _prep_out_files(exec_params=exec_params, out_files=out_files)
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 393, in _prep_out_files
out_files = _path_to_str_hash(
File "/home/sh254795/Documents/REPOS/mne-study-template/mne_bids_pipeline/_run.py", line 414, in _path_to_str_hash
this_hash = v.stat().st_mtime
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/pathlib.py", line 1097, in stat
return self._accessor.stat(self, follow_symlinks=follow_symlinks)
Modified the uploaded script to have:
bids_root = Path(__file__).parent
deriv_root = bids_root / "derivatives"
and re-ran. Observations unrelated to the issue that I figured you should know:
ica_reject_components
which is not used, I recommend removing it so that it's clear it's not needed. (I recently found a similar error in one of our test configs which is why I noticed!)preprocessing/_08a_apply_ica
step. I have a pretty good machine but I'm surprised it takes so long on yours if it's hours for just subject 155 (?). After about an hour it was on some decoding step but the error happens before that so not necessary to keep going at that point.I was unable to reproduce the error. Can you try latest main
of MNE-Python, MNE-BIDS, and MNE-BIDS-Pipeline? Here's my sys_info
for those for reference:
├☑ mne 1.7.0.dev120+g38e2839a36 (devel, latest release is 1.6.1)
├☑ mne-bids 0.15.0.dev31+gb85a0dc7.d20240216
├☑ mne-bids-pipeline 1.6.0.dev18+g9f8b170
I have a pretty good machine but I'm surprised it takes so long on yours
If things are still the way I remember them, the data is stored on an NFS that is super slow esp during the initialization step(s)
thanks @larsoner, I will try again with the latest mains. and thanks for checking the config!
@larsoner, unfortunately, I still get the error. I adopted all your suggestions for the config and my sys-info is below. Any advice on what to do?
┌────────┬ preprocessing/_08a_apply_ica ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│17:34:17│ ⏳️ sub-155 Input: sub-155_task-tiwm_split-01_epo.fif
│17:34:17│ ⏳️ sub-155 Output: sub-155_task-tiwm_proc-ica_split-01_epo.fif
│17:34:25│ ⏳️ sub-155 Rejecting ICs: 3, 5, 9, 12, 15, 21, 31, 33, 41, 43, 53, 57, 63, 71
│17:34:40│ ⏳️ sub-155 Saving reconstructed epochs after ICA.
│17:35:09│ ❌ sub-155 A critical error occurred. The error message was: Missing file: /neurospin/meg/meg_tmp/TimeInWM_Izem_2019/BIDS_anonymized/derivatives/sub-155/meg/sub-155_task-tiwm_proc-ica_split-01_epo.fif
Aborting pipeline run. The traceback is:
File "/home/sh254795/anaconda3/envs/mne/lib/python3.10/site-packages/mne_bids_pipeline/_run.py", line 329, in _update_for_splits
assert not missing, f"Missing file: {bids_path.fpath}"
Platform Linux-6.5.0-17-generic-x86_64-with-glibc2.35
Python 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:08:06) [GCC 11.3.0]
Executable /home/sh254795/anaconda3/envs/mne/bin/python
CPU x86_64 (48 cores)
Memory 125.8 GB
Core
├☑ mne 1.7.0.dev120+g614424718 (devel, latest release is 1.6.1)
├☑ numpy 1.23.5 (OpenBLAS 0.3.21 with 48 threads)
├☑ scipy 1.10.1
└☑ matplotlib 3.7.1 (backend=agg)
Numerical (optional)
├☑ sklearn 1.2.2
├☑ numba 0.56.4
├☑ nibabel 5.1.0
├☑ nilearn 0.10.0
├☑ dipy 1.6.0
├☑ openmeeg 2.5.5
├☑ pandas 2.0.0
└☐ unavailable cupy
Visualization (optional)
├☑ pyvista 0.38.5 (OpenGL 4.5 (Core Profile) Mesa 22.3.2 via llvmpipe (LLVM 15.0.6, 256 bits))
├☑ pyvistaqt 0.0.0
├☑ vtk 9.3.0
├☑ qtpy 2.3.1 (None=None)
├☑ pyqtgraph 0.13.3
├☑ mne-qt-browser 0.0.0
├☑ ipywidgets 8.0.6
├☑ trame_client 2.13.0
├☑ trame_server 2.12.1
└☐ unavailable ipympl, trame_vtk, trame_vuetify
Ecosystem (optional)
├☑ mne-bids 0.15.0.dev32+g4fd0ba0
├☑ mne-bids-pipeline 1.6.0.dev19+g5d4e3c3
└☐ unavailable mne-nirs, mne-features, mne-connectivity, mne-icalabel, neo
@larsoner I just saw that before the pipeline breaks, these two files were written, i.e. the filenames are wrong. I must have missed those before even though I looked through the folder, maybe because I ran several subjects in parallel (at least this seems a more likely explanation, than their magical appearance after running the same thing n times). I also ran it on a different machine, but with access to the same server for the data, and the same mne setup.
sub-155_task-tiwm_proc-ica_split-01_split-01_epo.fif sub-155_task-tiwm_proc-ica_split-01_split-02_epo.fif
Any idea how to prevent these erroneous filenames?
Oooohhhh good catch!
..... stupid split files ....... We should probably have better handling for them in MNE-BIDS
Until then, we can probably work around this problem somehow
@hoechenberger can you suggest a workaround? I am stuck and a student is waiting to get the cleaned epoch files.
I don't know a quick workaround from the top of my head, sorry. I hope @larsoner may find some time to look into this, as I'm rather busy for the entire week. One temporary "solution" that comes to mind is making the epochs shorter such that you don't end up with split files. I know this is not what you eventually will want to use, but that way you could at least produce a "preview dataset" your student can use to start implementing their analysis, until the "actual" dataset is ready
I reran the pipeline after adjusting ICA parameters. Now I am stuck with this error: (I am on the latest main fetched yesterday)