mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.66k stars 1.31k forks source link

`.get_data()`'s `picks` docstring does not explicitly state that bad channels are included by default #12197

Open hoechenberger opened 10 months ago

hoechenberger commented 10 months ago

Proposed documentation enhancement

Currently, the doctoring for the picks parameter of .get_data() reads:

picksstr | array_like | slice | None

Channels to include. Slices and lists of integers will be interpreted as channel indices. In lists, channel type strings (e.g., ['meg', 'eeg']) will pick channels of those types, channel name strings (e.g., ['MEG0111', 'MEG2623'] will pick the given channels. Can also be the string values “all” to pick all channels, or “data” to pick data channels. None (default) will pick all channels. Note that channels in info['bads'] will be included if their names or indices are explicitly provided.

See e.g. https://mne.tools/dev/generated/mne.Epochs.html#mne.Epochs.get_data

I always assumed that bad channels are not included by default (None), as I didn't provide their names or indices. But that doesn't seem to be the case.

The doctoring should be updated to explicitly state that by default, bads will be included.

This also suggests that we should take a look at our tutorials that make use of .get_data() and update them to pass picks="data" or similar.

We may also need to take a look at the decoding steps of MNE-BIDS-Pipeline, which use .get_data(), too.

Edit: In most (all?) tutorials we run inst.pick(..., exclude="bads"), which avoids the issue mentioned above.

drammock commented 10 months ago

thank you for opening this. I think the docstring is confusing but technically not wrong:

None (default) will pick all channels

does imply that bad channels will be included. but the following sentence is the one that I think creates a misleading impression:

Note that channels in info['bads'] will be included if their names or indices are explicitly provided.

This is meant to say "if you're passing an explicit list of indices or strings to picks, then note that we don't automatically exclude channels in .info['bads'] in that case" or in other words: "warning, you might accidentally overrule .info['bads'] if you're not careful about which channel names/indices you pass in to picks"

That is definitely a mouthful and not easy to get from the docstring as it is currently written. So yes, we should try to make it less confusing / ambiguous.

But it also brings up for me a larger issue: as you know the default value of picks=None gets interpreted differently in different parts of the codebase --- sometimes None means "good data channels", sometimes "good and bad data channels", sometimes "good channels", and sometimes "good and bad channels" (AKA all channels). Also I think there are flavors like "good data channels except not ref-meg".

I think it would be a big improvement to both the user and developer experiences if the default in all of these cases was a string representing the flavor: i.e., in funcs/methods where we want the default to be "good data" channels, the default should be something like picks="good data" instead of picks=None. I.e., if you had seen that the default were picks="all data", and known from the docstring (or linked glossary entry?) how to interpret that, i.e., that "all" means "good and bad", then I think this also would have reduced the likelihood of confusion/mistake.

larsoner commented 10 months ago

I think it would be a big improvement to both the user and developer experiences if the default in all of these cases was a string representing the flavor

+1 on getting rid of picks=None in favor of something like picks="data". But another option that I expect will be easier to implement and work with existing parts of our code and use cases would be to use the pair of picks and exclude to achive "all data" for example as:

picks="data", exclude=()

and "good data" as:

picks="data", exclude="bads"

In other words, we can split up the bad inclusion/exclusion using the exclude kwarg instead of packing it into picks as "good data". exclude=None or exclude="bads" is already used a bunch of places in the codebase:

$ git grep "exclude=\"bads\"" mne | wc -l
170
$ git grep "exclude=None" mne | wc -l
23

IIRC this is what we did recently with the Spectrum-including-bads PRs. Getting these uses to work nicely / as expected with picks="all data" when we already have exclude later in the signature I expect to be a lot messier than 1) adding exclude=... where we currently don't have it and 2) avoiding picks=None everywhere (replaced by explicit picks=<str>, exclude=None | "bads use).

larsoner commented 10 months ago

... the other advantage of spltting in this way is things like picks="eeg", exclude=... can work and we don't have to implement stuff like "good eeg" and "all eeg" etc.

drammock commented 10 months ago

+1 for @larsoner's proposal, on the assumption that we actually have an exclude param in all the relevant places (I think we do but haven't checked). And even if not, it feels cleaner to just add exclude where it's missing (in addition to changing the pick default to "all" or "data") rather than adding handling for a bunch of new strings like "good data" to picks

cbrnr commented 10 months ago

I'm also +1 for tackling this with picks and exclude. Regarding the initial comment, I think we actually used to exclude all bads when picks=None. When did this change?

larsoner commented 10 months ago

I think we actually used to exclude all bads when picks=None. When did this change?

At least from a few years back when we added picks=<str> support I think it could mean "all channels", "all good channels", "all data channels", or "all good data channels" and which one it meant varied from function to function.

hoechenberger commented 10 months ago

I like @larsoner's proposal

@drammock .get_data() unfortunately doesn't have an exclude parameter

larsoner commented 10 months ago

on the assumption that we actually have an exclude param in all the relevant places (I think we do but haven't checked). And even if not, it feels cleaner to just add exclude where it's missing

.get_data() unfortunately doesn't have an exclude parameter

Indeed there are probably a number of places we'll want to add it. It should be easy enough to add them as we eradicate picks=None -- we're going to be modifying those signatures anyway so it's easy to check for exclude at the same time.