mne-tools / mne-bids

MNE-BIDS is a Python package that allows you to read and write BIDS-compatible datasets with the help of MNE-Python.
https://mne.tools/mne-bids/
BSD 3-Clause "New" or "Revised" License
133 stars 88 forks source link

`copyfile_*` fails in `write_raw_bids` when `overwrite=True` and `src==dest` #867

Closed adam2392 closed 2 years ago

adam2392 commented 3 years ago

Describe the bug

I am trying to append annotations to the events.tsv file. I found that the easiest way to do so is to simply do: i) read_raw_bids, ii) append annotations to the read in raw object and then iii) write_raw_bids again.

However, if I'm writing to the exact same location, then the file actually is removed. I think this is perhaps hidden behavior of shuttle.copyfile(src, dest), where if src and dest are the same, then the file is deleted.

Steps to reproduce

raw = read_raw_edf(<some_edf_file>)
bids_path = BIDSPath(root=root, subject='001', datatype='eeg')
output_bids_path = write_raw_bids(raw, bids_path)

raw = read_raw_bids(output_bids_path)
write_raw_bids(raw, output_bids_path)

Expected results

At the very least, an error should be raised, and the file shouldn't be deleted.

Otherwise, if the file already exists, then copy somewhere temporarily and then copy to the same place again? The con is that one writes twice now.

Actual results

File is deleted, and the program errors.

Additional information

〈Replace this text with information about your system. For example through using MNE-Python and running mne.sys_info() and pasting the output here.βŒͺ

hoechenberger commented 3 years ago

According to the Python docs:

If src and dst specify the same file, SameFileError is raised.

So if the file disappears then this is not due to copyfile, but the deletion must occur somewhere else

sappelhoff commented 3 years ago

I can't replicate this issue on mne-bids and mne-python main:

import os
import mne
from mne_bids import BIDSPath, write_raw_bids, read_raw_bids

testing = mne.datasets.testing.data_path()
fin = os.path.join(testing, "EDF", "test_generator_2.edf")
root = os.path.join(os.path.expanduser("~"), "Desktop", "mne_bids_issue-867")
os.makedirs(root, exist_ok=True)
raw = mne.io.read_raw_edf(fin)
bids_path = BIDSPath(root=root, subject='001', datatype='eeg')
output_bids_path = write_raw_bids(raw, bids_path)

raw = read_raw_bids(output_bids_path)
write_raw_bids(raw, output_bids_path)

It raises a FileExistsError, as expected (see Richards comment above)

hoechenberger commented 3 years ago

It raises a FileExistsError, as expected (see Richards comment above)

No, I expected a SameFileError based on the Python docs πŸ˜…πŸ˜…πŸ˜…

hoechenberger commented 3 years ago

It raises a FileExistsError, as expected (see Richards comment above)

No, I expected a SameFileError based on the Python docs πŸ˜…πŸ˜…πŸ˜…

Ah no, wait, only for copyfile I would expect this 😁

adam2392 commented 3 years ago

Apologies, I think the snippet I was using was with overwrite=True:

import os
import mne
from mne_bids import BIDSPath, write_raw_bids, read_raw_bids

testing = mne.datasets.testing.data_path()
fin = os.path.join(testing, "EDF", "test_generator_2.edf")
root = os.path.join(os.path.expanduser("~"), "Desktop", "mne_bids_issue-867")
os.makedirs(root, exist_ok=True)
raw = mne.io.read_raw_edf(fin)
bids_path = BIDSPath(root=root, subject='001', datatype='eeg')
output_bids_path = write_raw_bids(raw, bids_path)

raw = read_raw_bids(output_bids_path)
write_raw_bids(raw, output_bids_path, overwrite=True)

This results in the error I specified. I ran into it again because I was trying to overwrite the metadata via write_raw_bids multiple times.

sappelhoff commented 3 years ago

you are right, this is the problematic part: https://github.com/mne-tools/mne-bids/blob/c6a3f84ad210ffee7d069ab1062c3b9e3897546d/mne_bids/write.py#L1517-L1524

sappelhoff commented 3 years ago

This is a problem whenever we COPY (i.e., not convert) a file and the source and destination are the same.

Because the snippet I posted above comes before the copying ... so the source is removed before it can be used for copying.

This also reveals another issue: Namely, the if clause in the above snippet checks if bids_path.fpath exists, and removes files contingent on that. However, AFTER that, but BEFORE converting/copying, we modify bids_path. So it's possible that in some scenarios, files are unnecessarily (unwanted) removed.

@adam2392 do you want to work on a fix?

adam2392 commented 3 years ago

I don't think there's a good fix besides temporarily renaming the file and then copying it and then deleting the temporarily renamed file. Pushing fix rn.

Actually this is a deeper issue with trying to copy files. Anytime we copy files, we need to just raise an error. There's no way around it unless we make a temporary copy of the file and then move it, and then copy it. This raises an issue because then you get in the business of i) accidentally deleting the file, if the copy goes wrong, or ii) adding extra files and imposing storage issues.