Closed pablomainar closed 11 months ago
Hello! 👋 Thanks for opening your first issue here! ❤️ We will try to get back to you soon. 🚴
Yes I think this is a bug, get_data
should probably always return a copy of the data not a view. @pablomainar would you be up for trying to fix this?
(or if we already do return a copy, make sure we make the copy before applying the scaling)
Maybe return a copy only if units are changed, and return a view if not? Otherwise, we're forcing unnecessary and slow copy operations on users, which might negatively impact performance.
So the problem comes from the code in lines 1670 to 1678 from epochs.py.
if self.preload:
data = data[select]
if orig_picks is not None:
data = data[:, picks]
if units is not None:
data *= ch_factors[:, np.newaxis]
if start != 0 or stop != self.times.size:
data = data[..., start:stop]
return data
data is a view of self._data. So slicing with select, picks or start/stop creates a copy but multipliyng by ch_factors is done inplace, therefore changing the underlying self._data.
As @cbrnr mentions it is probably more efficient to simply modify the way the multiplication is done, and return a copy in that case. I will create a PR with a fix asap.
As @cbrnr mentions it is probably more efficient to simply modify the way the multiplication is done, and return a copy in that case
I think correctness/safety outweighs efficiency here. Returning a view sometimes and a copy others is dangerous and can lead to silent and hard to diagnose (or even see the symptoms of in the first place) bugs. A use could very easily .get_data, do an in place operation on the array, and now their Epochs are silently broken.
If we really want to return a view sometimes I think we'd have to add a copy=True (default) that people can set to false to get non-copied data when it's possible to get it.
Argh actually it looks like historically (for at least 10 years!) we returned a view / non-copy, e.g.:
So I guess we should live with it. Looking at https://mne.tools/stable/generated/mne.Epochs.html#mne.Epochs.get_data I don't see any warning about this so @pablomainar would you be up for adding the minimal fix (in the case of unit scaling, add a copy) and also adding a .. warning::
to epochs.get_data
that says sometimes the result will be a view, which means that modifying it in-place will modify the data in Epochs
itself to be incorrect?
Correctness/safety outweighs efficiency, but history trumps everything 🤣!
Argh actually it looks like historically (for at least 10 years!) we returned a view / non-copy, [...] So I guess we should live with it.
This doesn't make sense to me. To me it seems like a bug that we sometimes return a view, and we should fix it (even if the bug is subtle / rarely causes problems / is very old). Is the worry that some users might have figured out that we return a view, and made use of that knowledge to modify the Epochs
data intentionally? That seems even less likely than the case of modifications happening unknowingly / by accident.
This is clearly documented:
Returns data : array of shape (n_epochs, n_channels, n_times) A view on epochs data.
But this is still inconsistent, Raw.get_data()
always returns a copy, whereas Evoked.get_data()
returns a view, just like Epochs.get_data()
. At least according to their docstrings.
This is clearly documented
ah, ok. in that case I feel less strongly that we should change it right now.
But this is still inconsistent,
Raw.get_data()
always returns a copy, whereasEvoked.get_data()
returns a view, just likeEpochs.get_data()
. At least according to their docstrings.
agree that it's inconsistent. I wouldn't mind changing all of them to have a copy
param and migrating to copy=True
as the default for all 3 objects.
agree that it's inconsistent. I wouldn't mind changing all of them to have a copy param and migrating to copy=True as the default for all 3 objects.
👍
Although in some cases, I don't know if it will be that easy to switch from the current copy behavior to a view. After all, NumPy fancy indexing always produces a copy, and we're using that quite often (just not in the case that was reported here).
I think it's okay to have copy=True
mean "always return a copy" and copy=False
mean "return a view if possible, and use at your own risk"
I think it's okay to have copy=True mean "always return a copy" and copy=False mean "return a view if possible, and use at your own risk"
But in the latter case, I think we should really warn ("you requested a view, but this is not possible with your combination of arguments, so you're getting a copy instead").
Okay with me
Do you think it actually makes sense to return a view at all?
(1) The output of get_data() should never be used to modify the Epochs underlying data. If the user wants to do this, there are other ways. (2) Inside the _get_data() method a variable "data" is currently created as a view of self._data. So it is also dangerous that future updates to the method will end up mistakenly modifying data and therefore self._data. I think self._data should never be modified inside _get_data() and the best way of avoiding it is to create a copy from the very beginning of the method. (3) Even if we want to return a view in some cases for performance reasons, the number of scenarios that we are able to do it is limited. As soon as they ask for a slice in any axis, _get_data() must return a copy.
If we have a copy argument, we might leave (1) for the user with a warning, which might be fine. But (2) can give problems in future versions, as we cannot convert "data" into a copy from the very beginning.
Do you think it actually makes sense to return a view at all?
see @larsoner's comment here:
I think the most common case here is probably for the user "I don't care if it's a copy or not just make it as efficient as possible" often for passing to a sklearn classifier
In other words, I think the two reasonable use cases are
Epochs
object again once I have a handle for its data"
Description of the problem
Hello,
When creating an Epochs object with the argument preloaded set to True, the get_data method does not work as expected when passing a unit that is not volts. It will work well the first time, but subsequent calls will each time return data that is multiplied by the unit factor. Please note that this only happens when the preload argument is passed as True due to the if statement of line 1670 of epochs.py.
Steps to reproduce
Link to data
No response
Expected results
Subsequent calls to epochs.get_data(units="uV") should always return the same value, since the underlying data is not supposed to be modified. In the previous code, this value should be 1uV.
Actual results
After the first call to epochs.get_data(units="uV") the underlying data is multiplied by 10^6 (if the unit is uV) each time that get_data is called, therefore giving different results each time.
Additional information
Platform macOS-10.16-x86_64-i386-64bit Python 3.8.13 (default, Mar 28 2022, 06:16:26) [Clang 12.0.0 ] Executable /Users/pmainar/miniconda3/envs/xdf/bin/python CPU i386 (8 cores) Memory 16.0 GB
Core ├☑ mne 1.5.0 ├☑ numpy 1.22.4 (OpenBLAS 0.3.18 with 4 threads) ├☑ scipy 1.9.0 ├☑ matplotlib 3.7.0 (backend=module://ipympl.backend_nbagg) ├☑ pooch 1.6.0 └☑ jinja2 3.1.2
Numerical (optional) ├☑ sklearn 1.0.2 ├☑ numba 0.57.1 ├☑ pandas 1.4.2 └☐ unavailable nibabel, nilearn, dipy, openmeeg, cupy
Visualization (optional) ├☑ qtpy 2.2.0 (None=None) ├☑ ipympl 0.9.3 └☐ unavailable pyvista, pyvistaqt, ipyvtklink, vtk, pyqtgraph, mne-qt-browser
Ecosystem (optional) └☐ unavailable mne-bids, mne-nirs, mne-features, mne-connectivity, mne-icalabel, mne-bids-pipeline