mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

memory leak when creating epochs #4473

Closed jasmainak closed 7 years ago

jasmainak commented 7 years ago

Together with @wmvanvliet we observed a memory leak when creating epochs. Here is a script to reproduce:

import os.path as op
import mne

from memory_profiler import memory_usage

data_path = mne.datasets.sample.data_path()
fname = op.join(data_path, 'MEG', 'sample', 'sample_audvis_raw.fif')

def mne_size(inst):
    repr_ = inst.__repr__().split(', ')
    return float([r[1:-3] for r in repr_ if 'MB' in r][0])

def run():
    epochs = list()
    for i in range(15):
        raw = mne.io.read_raw_fif(fname, preload=True)

        events = mne.find_events(raw)

        tmin, tmax = -0.2, 0.5
        event_id = {'Auditory/Left': 1, 'Auditory/Right': 2,
                    'Visual/Left': 3, 'Visual/Right': 4}
        picks = mne.pick_types(raw.info, meg=True, eeg=False, eog=True)
        epochs.append(mne.Epochs(raw, events=events, event_id=event_id,
                      tmin=tmin, tmax=tmax, reject=None,
                      picks=picks, preload=True))

    total_size = mne_size(epochs[0]) * len(epochs) + mne_size(raw)
    print('Size reported by MNE: %0.2f MB' % total_size)

real_size = memory_usage(run, max_usage=True)
print('Actual size in memory: %0.2f MB' % real_size[0])

Please change the number of iterations 15 before trying in your laptop if it can't handle more than 10 GB.

agramfort commented 7 years ago

why do you call this a memory leak? it is by design that Epochs have a ref on raw and if raw is preloaded then you have it in memory. Use memmap for raw to avoid this.

jasmainak commented 7 years ago

Why does the Epochs need to have a ref to raw once it is preloaded? I don't need the raw object anymore as the epochs have been created and loaded into the disk

agramfort commented 7 years ago

certain functions like tf_lcmv need it but maybe this can be questioned indeed.

wmvanvliet commented 7 years ago

What is the benefit of keeping track of the original raw when preload=True? When loading an epochs there is no .raw attribute and everything seems to work ok.

agramfort commented 7 years ago

then it means that functions that need Epochs.raw as tf_lcmv should require preload=False

what if you change the reject param sequentially as @jasmainak implemented some time ago?

wmvanvliet commented 7 years ago

ah ok, so it is used? Then we should leave it as it is and change the biomag demo instead.

jasmainak commented 7 years ago

I think this should be eventually fixed in MNE though and the lcmv examples changed.

It is counterintuitive, since the __repr__ claims that we consume less memory. I seem to remember that we used to delete the reference to raw after loading the data in Epochs. But perhaps I am mistaken.

I don't see a problem with sequential reject param as long as the threshold gets stricter (which was enforced).

agramfort commented 7 years ago

ok so maybe we can fix this....

larsoner commented 7 years ago

Yeah it seems a bit silly to keep it around after preload. I'll take a look