mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.66k stars 1.31k forks source link

Indexing error in Evoked.plot_topomap() when trying to mask channels #9611

Closed hoechenberger closed 3 years ago

hoechenberger commented 3 years ago

So either this is a documentation problem or a code bug: I supply times and mask with a shape of (n_times, n_chans) to Evoked.plot_topomap(), and it crashes with an IndexError.

# %%
from pathlib import Path
import numpy as np

import mne

sample_data_dir = Path(mne.datasets.sample.data_path())
fname = sample_data_dir / 'MEG' / 'sample' / 'sample_audvis-ave.fif'

condition = 'Left Auditory'
evoked = mne.read_evokeds(fname=fname, condition=condition)
evoked.pick('mag')
evoked.apply_baseline((None, 0))

times = (0.090, 0.110)
mask = np.empty(
    [len(evoked.ch_names), len(times)]
)
mask.fill(False)
mask[20, :] = True

fig = evoked.plot_topomap(times=times, time_unit='s', mask=mask)
Traceback (most recent call last):
  File "/private/tmp/mwe_topo.py", line 22, in <module>
    fig = evoked.plot_topomap(times=times, time_unit='s', mask=mask)
  File "/Users/hoechenberger/Development/mne-python/mne/evoked.py", line 478, in plot_topomap
    return plot_evoked_topomap(
  File "/Users/hoechenberger/Development/mne-python/mne/viz/topomap.py", line 1697, in plot_evoked_topomap
    mask_ = mask[np.ix_(picks, time_idx)]
IndexError: index 174 is out of bounds for axis 1 with size 2

I expected a topoplot at the two time points, showing only the first 20 channels.

hoechenberger commented 3 years ago

Just poking @drammock here – no pressure at all, just want to ensure you're aware of this issue :)

mscheltienne commented 3 years ago

Possible fix resulting in this plot:

image

Changes are restricted to mne\viz\topomap.py:

Line 1697, fixing the IndexError:

(-) mask_ = mask[np.ix_(picks, time_idx)]
(+) mask_ = mask[np.ix_(picks, range(len(time_idx)))]

2 similar lines should be also impacted above for the channels of type 'grad'.

Line 993:

(-) idx = np.where(~mask)[0]
(+) idx = np.where(mask)[0]

For some reason, out of all the if/else statements, one was using a bit-wise inversion.

hoechenberger commented 3 years ago

Nice work, @mscheltienne!

Now it looks as if the actual topographies are still based on all sensors; I actually wanted to mask that too. Is this not possible / is my thinking wrong here? @agramfort?

mscheltienne commented 3 years ago

@hoechenberger As a side note, the plot above was obtained with the 20 first channels instead of the 20th channel, i.e. with mask[:20, :] = True instead of mask[20, :] = True.

hoechenberger commented 3 years ago

@hoechenberger As a side note, the plot above was obtained with the 20 first channels instead of the 20th channel, i.e. with mask[:20, :] = True instead of mask[20, :] = True.

Oh, nice catch, thanks!

hoechenberger commented 3 years ago

@mscheltienne Would you like to take a stab at this and make a PR with a fix? :)

mscheltienne commented 3 years ago

@hoechenberger Sure, but could someone first confirm that the plot above is the expected behavior of the mask argument, i.e. that the topographies remain based on all sensors? Should the documentation for the mask argument be changed?

On a side note, I think you can create topographies based on a subset of sensors by setting the non-significant sensors in info['bads'] and using a local extrapolation.

mmagnuski commented 3 years ago

The n_times in the documentation may relate to number of timepoints in the data, not the number of points chosen in the times array. In this sense I understand it to be a documentation issue.

hoechenberger commented 3 years ago

The n_times in the documentation may relate to number of timepoints in the data, not the number of points chosen in the times array. In this sense I understand it to be a documentation issue.

Yes, could be. But it doesn't make sense really … as in, that's not great from a UX point of view.

mmagnuski commented 3 years ago

@hoechenberger That depends on what kind of mask you have - if you got it from cluster-based test on the whole time range (all time samples) then it makes sense to me.

hoechenberger commented 3 years ago

@mmagnuski Good observation! So – do you suggest we just fine-tune the docstring and add an example?

mscheltienne commented 3 years ago

@hoechenberger @mmagnuski I'm not sure I followed. In my understanding: The mask is simply a visual feature that displays the significant sensors at a given timepoint, i.e. for a given topographic map. The times argument lets you define the time points, i.e. the number of topographic maps and their location on the time axis. Each of those topographic maps can have a different mask applied, i.e. different significative sensors to highlight on the map. Thus, I interpret the n_times in (n_channels, n_times) as the number of topographic maps displayed. i.e. the number of points chosen on the time-axis.

I do not get your comment:

The n_times in the documentation may relate to the number of timepoints in the data, not the number of points in the times > array.

Number of timepoints in the data.. do you mean sample rate * duration? In which case what would the mask be?

mscheltienne commented 3 years ago

Also, in mne\viz\topomap.py, line 1694 in the if/else condition for gradiometers, what is the goal of this?

mask_ = (mask[np.ix_(picks[::2], time_idx)] | mask[np.ix_(picks[1::2], time_idx)])

For the moment it will crash with the same IndexError; but why do you need to split the even and odd picks indices? Secondly, if time_idx is replaced by range(len(time_idx)) to fix the IndexError as for the other types of sensors, this line will crash due to the bitwise OR |. Why is it needed?

TypeError: ufunc 'bitwise_or' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

mmagnuski commented 3 years ago

Number of timepoints in the data.. do you mean sample rate * duration? In which case what would the mask be?

Yes, I meant that mask has to be in the same dimensions as the evoked.data - so that times argument picks the same timepoints from both the evoked data and the mask. Such approach is useful in some situations:

but is may be cumbersome in others.

The easiest "fix" is to make the docs clearer - that a mask of (n_channels, n_samples) is required. Another may be to change the behavior of .plot_topomap() so that masks of (n_channels, n_times) (where n_times = len(times) for the times input argument) are also allowed.

mscheltienne commented 3 years ago

@mmagnuski Ok, now I get it. This is not what I implemented in the PR. In the PR, the mask argument shape must be (n_channels, n_times) where n_times = len(times), e.g. as in the example above: mask = np.empty([len(evoked.ch_names), len(times)]). With this proposed PR, the documentation doesn't need to be changed.

mmagnuski commented 3 years ago

@mscheltienne We don't want to change the current behavior as it may break previously working code. So we either change only the docs or add support for masks of (n_channels, n_times). This would require checking whether the mask is (n_channels, n_samples) or (n_channels, n_times) and acting accordingly. I am not sure it is worth it, so I would start with simply making the docs clear.

mscheltienne commented 3 years ago

@mmagnuski Indeed, I actually didn't pay attention that the previous code was working with (n_channels, n_samples) as the initial issue example was using the shape advised by the documentation. Adding support for both is not that difficult, and I could do that in the PR if you prefer.

mmagnuski commented 3 years ago

@mscheltienne Yes, it is not difficult, but I don't have a strong opinion on how useful that might be once the docs are clear. @hoechenberger @larsoner @agramfort @drammock WDYT?

mmagnuski commented 3 years ago

But whether we add support for (n_channels, n_times) mask or not, adding an example, as @hoechenberger suggests is a good idea.

mscheltienne commented 3 years ago

@mmagnuski Ok, for me the shape (n_channels, n_times) makes sense as you create N topographies at N different times; with one mask per topography. Thus, all the additional False columns in an array of shape (n_channels, n_samples) are useless. To add an example, if you want me to add it to the PR, I will need a little bit of guidance: which set of files needs a change, where can I find an example to mimic the syntax?

mmagnuski commented 3 years ago

@mscheltienne I think we'd need a concrete example / use case. In most situations that seem natural to me the mask is created by some operation on the evoked data, and evoked data is n_channels x n_samples so these are dimensions of the mask you get back. Then evoked.plot() finds time indices for evoked.data and applies to both: evoked.data and the provided mask. This way the mask does not depend on the selected timepoints. If you want to highlight the same set of channels on each topo it is still easy:

mask = np.zeros(evoked.shape, dtype='bool')
mask[[1, 3, 8, 9], :] = True

And for other uses it is still easy to use mne.viz.plot_topomap and provide mask for each topography separately.

drammock commented 3 years ago

sorry to be super-late chiming in on this issue. Here is my impression:

  1. the intent of mask is only to affect the drawing of sensor markers on the plot. It is not intended to affect the field interpolation. (I think if you want to see field interpolation based on a subset of sensors it should work to drop all other sensors first before plotting?)
  2. the docstring of evoked.plot_topomap() should be updated to emphasize that mask must match the shape of evoked.data
  3. if you want to specify mask as (n_channels, n_displayed_times) then I agree with @mmagnuski that we'd want to hear an example use case where this would make sense. Note that you can still make it work:
mask = np.zeros(evoked.data.shape, dtype='bool')
my_times = (0.09, 0.11)
my_signif_channels = [('MEG 2541', 'MEG 2331'), ('MEG 0111', 'MEG 0121', 'MEG 0131')]
which_times = np.searchsorted(evoked.times, my_times) - 1
which_channels = [np.in1d(evoked.ch_names, my_ch) for my_ch in my_signif_channels]
for _chs, _time in zip(which_channels, which_times):
    mask[_chs, _time] = True
fig = evoked.plot_topomap(times=my_times, time_unit='s', mask=mask, mask_params=dict(markersize=10, markerfacecolor='y'))

fig1

mscheltienne commented 3 years ago

On a side note, in any case, the dtype of the mask array should be checked with mask = mask.astype(bool) to avoid running into e.g. the error below when users forget to specify dtype=bool.

TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

mmagnuski commented 3 years ago

@drammock Thanks, I completely missed point 1 in this discussion. Yes, to plot topomap based on only some of the sensors you need to drop the remaining ones and use extrapolate='local'. @mscheltienne .astype() would need copy=False, so that the mask is not copied if it has the correct dtype.

The code snippet posted by @drammock could be a part of a short example on masking channels with evoked.plot() and plot_topomap(). It could also be added to this example/tutorial on topomap plotting: https://mne.tools/stable/auto_examples/visualization/evoked_topomap.html#sphx-glr-auto-examples-visualization-evoked-topomap-py.

hoechenberger commented 3 years ago

Was addressed via a doc improvement in #9647