mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.66k stars 1.31k forks source link

ENH: Make _pick_to_idx public #11913

Open larsoner opened 1 year ago

larsoner commented 1 year ago

@drammock other than mne.io.constants.FIFF (which is pretty painless to keep) this is the list of what we have to keep around to make sibling packages happy. Do you think we should make some variant of _picks_to_idx public?

_Originally posted by @larsoner in https://github.com/mne-tools/mne-python/pull/11903#discussion_r1301869325_

We'll have to think about the right API and we might not want to expose all options but it should be doable

mscheltienne commented 8 months ago

I would like to rework the channel selection/pick API with the following goal in mind:

  1. Reduce code duplication
  2. Reduce the number of functions/methods in the public API
  3. Harmonize the function names/arguments
  4. Propose a public API for _pick_to_idx
  5. Propose an API with input validation and a faster version skipping the validation

After going through mne._fiff.pick, I propose:

def pick_ch_names_to_idx(
    ch_names: list[str] | tuple[str] | set[str] | NDArray[str],
    picks: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
    exclude: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
) -> NDArray[np.int32]:
    pass
def pick_info_to_idx(  # or pick_to_idx?
    info: Info,
    picks: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
    exclude: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
) -> NDArray[np.int32]:
    pass

Both picks or exclude support the same types, with str being either:

I'm not sure if allow_empty: bool should be part of the public API. IMO, this argument is linked to a second one validation: bool which controls the presence or absence of validation of the input, to e.g. speed-up the selection if you know your inputs are valid and/or if you don't 'care'. IMO, 2 options:


Simultaneously, I would also deprecate for good the legacy pick_channels methods to tend towards a simpler and intuitive API. I believe those changes cover the majority of use-cases both in mne API and in external code. Those changes can be implemented piece by piece:


The removal of the existing public pick methods in mne._fiff.pick is backward incompatible. The impact can be mitigated by either (1) a long deprecation cycle or (2) keeping those methods as legacy for now (especially as the documentation of those methods is at the mne-level, e.g. mne.pick_info, mne.pick_types).


I believe I have the time to work on this change before 1.7 is out, but let's take the time to think about the API first ;)

larsoner commented 8 months ago

Without getting into too much detail, I would add to the priority list above:

  1. @deprecateding as few things as possible in favor of @legacying them

In theory any new functionality like cov.pick should just extend existing functionality, so hopefully that function becomes a one-line wrapper like return cov.copy().pick(...), in which case keeping pick_channels_cov in the namespace carries minimal maintenance burden. We could even probably remove it from the API page to help with your point (2) above. I like the idea of modernizing the interfaces but leaving the (decade-?)old functions around for backward compat when the cost is minimal like this.

Beyond that I would suggest breaking this up as much as possible, otherwise the diff and review will be unmanageable. Hardest starting point (but maybe most satisfying?) might be to try to simplify the internal code to be more DRY. Simplest might be with something like implementing a cov.pick. A sort of in between would probably be to add the two new public functions you propose that directly address the _pick_to_idx-to-public issue in the title. But maybe you have some other plan...

There are a lot of details to work out with the above proposal, so I suggest @drammock and others who are interested first comment as well on the general idea, then @mscheltienne you propose what a first PR could include, then once we converge on that you can open it. That way we don't have to discuss everything all at once but can make incremental progress with a shared vision.

For me I'm overall +1 on these ideas subject to my suggestion to prefer @legacy over @deprecated.

drammock commented 8 months ago

I'll take a deeper look at the proposal soon, but meanwhile I'm adding a crossref to https://github.com/mne-tools/mne-python/issues/11531 and especially https://github.com/mne-tools/mne-python/issues/11531#issuecomment-1464442314

mscheltienne commented 8 months ago

With the proposed approach, we can keep most, if nit all functions with the legacy decorator.

I agree we should break things apart. I propose to get a first PR with some minor changes and the structure of the new functions, i. e. function names, arguments and type-hints for the arguments and returns. We can iterate on the structure before coding the logic.

drammock commented 8 months ago

regarding pick_info

  • pick_info moved to mne.Info.pick: this one is debatable as we need to prevent picking on an Info object attached to another object, e.g. Raw.

another possibility that occurs to me here is that raw.pick() and raw.info.pick() could do the same thing (i.e. both could modify raw inplace). On its face that seems like a good API to me, but it opens up the possibility of

my_info = raw.info
# ...many lines later:
my_info.pick()  # modifies `raw` (possibly unwittingly)

So I think we should avoid that. And obviously we can't modify an attached info without modifying the instance it's attached to, so a public raw.info.pick() must return a copy, which is at odds with how .pick() works on all the instances; so I feel like it's an unintuitive API in that sense. To my mind that leaves only two possibilities: your option (1):

Keep pick_info as is and don't add a channel selection method to an Info object

or an option where info becomes aware of whether it's attached to an instance, and if so, refuses to execute its .pick() method. That feels a bit overengineered and possibly fragile, so I'm voting for your option(1).

regarding functions selecting on a list-like of channel names

You'll see in https://github.com/mne-tools/mne-python/issues/11531#issuecomment-1464442314 that I advocated for @legacying pick_channels and pick_channels_regexp, but that idea was not popular with @agramfort and @jasmainak at least. Now, you're at least proposing to keep the functionality in a new, more flexible function, so maybe the idea will be more popular, but I worry a little bit about a single function being "too magic" in deciding when to interpret its input as a regexp and when to treat it like a plain channel name. Perhaps we take inspiration from Pandas here? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html#pandas.DataFrame.filter has items, like, and regex params which are mutually exclusive.

regarding pick and exclude

Maybe I'm misunderstanding but you seem to suggest that it will be possible to pass exclude="all" or exclude="data"; to me this seems pointless to support. Are there use cases you're imagining/encountering here?

mscheltienne commented 8 months ago

For pick_info, I agree, let's keep the existing function as it would be counter-intuitive to have a method not modify the instance in-place.


For pick_channels and pick_channels_regexp, you wrote:

hey should also be marked @legacy or even deprecated/removed

I believe @jasmainak and @agramfort were not advocating against @legacy, but against deprecation. I would prefer to mark them as @legacy as well along the rest of the functions mentioned. IMO, @legacy functions should also be removed from the documentation build.

I worry a little bit about a single function being "too magic"

I usually agree and prefer to split functionalities, but here it doesn't seem that difficult. If a string is provided, is it a channel name? is it a channel type? is it a valid regex? We could even drop this last point by supporting re.Pattern directly for regex.


For pick and exclude, they should accept the same arguments.. beside those 2 non-sense cases and beside exclude=None since the default is () in other part of the API :sweat_smile: