mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.72k stars 1.32k forks source link

RFC: plotting in notebooks with show=False #11528

Open drammock opened 1 year ago

drammock commented 1 year ago

Many times (most recently in #11513) users complain / get confused about plot behavior in Jupyter Notebooks, specifically the problem of figures getting "shown" too early and having empty subplot axes (typically when pre-generating a figure w/ subplots and looping over the subplot axes). Most of our plotting functions have a show=False option which will prevent the problem, but the parameter name is really confusing and I don't think it's very well documented how it should be used/interpreted in a notebook context.

One possible option would be to change the parameter to something like delay_plotting=True so that its name better reflects what is common to notebooks and interactive console sessions --- namely that the figure won't be shown immediately when it's created. Obviously, removing show will break a lot of user code, so I'd be inclined to retain show as a @legacy alias for the new param (and warn or error if both params are provided?)

I invite other suggestions of how to address this problem.

cross-ref to #11165 where a lot of related discussion can be found

cc @sam-s

cbrnr commented 1 year ago

Does this also include plots appearing twice (if show=True) or once (if show=False) due to Jupyter (or any interactive Python interpreter) automatically outputting any object's repr?

sam-s commented 1 year ago

Does this also include plots appearing twice (if show=True) or once (if show=False) due to Jupyter (or any interactive Python interpreter) automatically outputting any object's repr?

fig = plt.figure()
print(1)

still shows the figure fig even though the last form is not the figure.

larsoner commented 1 year ago

plt.show is the matplotlib-equivalent (and what we effectively call under the hood) so I actually still prefer our current name / -1 on changing it

drammock commented 1 year ago

Does this also include plots appearing twice (if show=True) or once (if show=False) due to Jupyter (or any interactive Python interpreter) automatically outputting any object's repr?

underlyingly these are related but distinct problems I think.

If you put this in a notebook cell you still get 2 figures:

import mne

sample_data_folder = mne.datasets.sample.data_path()
sample_data_raw_file = (sample_data_folder / 'MEG' / 'sample' /
                        'sample_audvis_raw.fif')
raw = mne.io.read_raw_fif(sample_data_raw_file, verbose=False, preload=False)
raw.crop(tmax=10).pick('eeg').load_data()
raw.plot_psd(show=False)  # <- doesn't work for the double-repr case! must use `;` or assign to var
drammock commented 1 year ago

plt.show is the matplotlib-equivalent (and what we effectively call under the hood) so I actually still prefer our current name / -1 on changing it

Keeping things as-is / closing as "not broken, won't fix" seems like a non-starter to me. It requires that users be familiar with details of how matplotlib works. From a dev perspective it makes sense --- show parameter means "should I call plt.show?" But if a user comes to MNE-Python without already knowing matplotlib, it's reasonable that they'd think "show=False should suppress the figure". Right now the documentation for the show parameter for plot_psd literally just says "Show the figure if True" so I think there's a pretty big explanatory gap to overcome.

Improving those docs would help, but it seems likely that no matter how clearly we document that "show=False means show 0 figs in interactive console sessions and 1 fig in notebook for-loops and 2 figs in notebooks if it's the last line in the cell and you don't assign to a var or use a semicolon", there will always be some users asking questions because they didn't understand it (or didn't read it at all). One way to forestall that is to make the param name a bit more reflective of what it's doing --- people probably have fewer preconceptions about what delay_plotting should do compared to just show (hopefully they'll think: "Delay until when? I better read this param description to find out..."). Probably there's an even better name that I haven't thought of, or some other approach besides parameter migration that would work better? I'm all ears, but I think we have to do something.

larsoner commented 1 year ago

Keeping things as-is / closing as "not broken, won't fix" seems like a non-starter to me. It requires that users be familiar with details of how matplotlib works...

I think that's true of our plotting functions, though, and it's a reasonable requirement / part of us using matplotlib that users will need to understand (or learn) a bit about it and how it behaves. As far as the show kwarg is concerned, we are basically using matplotlib how it's designed AFAICT, so to me this is a matplotlib problem and should be fixed there. My preferred option is thus to document better that show=True means "call show at the end", at least to start.

To illustrate this part of the issue, in #11513 I just used the code fig = plt.subplots(), and this produced the behavior that the user did not expect (i.e., that the inline plot is shown). So I don't think we should fight this battle at the MNE end... maybe mpl needs a context manager to delay such showing or something, not sure.

Improving those docs would help, but it seems likely that no matter how clearly we document that "show=False means show 0 figs in interactive console sessions and 1 fig in notebook for-loops and 2 figs in notebooks if it's the last line in the cell and you don't assign to a var or use a semicolon", there will always be some users asking questions because they didn't understand it (or didn't read it at all).

From a practical standpoint, if we document these behaviors nicely in some tutorial or doc then we can quickly point them there. It seems like no matter what we have to to document the various potentially odd-seeming behaviors of matplotlib, and to me avoiding them should happen with some matplotlib rather than MNE magic/context manager etc.

But if we absolutely need to change some behavior in MNE, I'm not sure we gain too much by changing the argument name. We could add show='delayed' (to magically fix the plt.subplots() always being shown?) or show='never' (to magically fix the _repr_html_ issue?) or whatever if we needed to, rather than adding a new param. This does ultimately have to do with how/when the figure is displayed/shown, so the name isn't so bad.

drammock commented 1 year ago

From a practical standpoint, if we document these behaviors nicely in some tutorial or doc then we can quickly point them there. It seems like no matter what we have to to document the various potentially odd-seeming behaviors of matplotlib, and to me avoiding them should happen with some matplotlib rather than MNE magic/context manager etc.

But if we absolutely need to change some behavior in MNE, I'm not sure we gain too much by changing the argument name. We could add show='delayed' (to magically fix the plt.subplots() always being shown?) or show='never' (to magically fix the _repr_html_ issue?) or whatever if we needed to, rather than adding a new param. This does ultimately have to do with how/when the figure is displayed/shown, so the name isn't so bad.

This is starting to sound like a better proposal than what I came up with. I'll stew on it for a few days and let others chime in too.

cbrnr commented 1 year ago

I think we should not abstract away how other packages work. We should expect users to have at least a very basic understanding of Matplotlib, and the same is true for notebooks. If someone wants to work in a notebook, it is not our task to make everything work "magically". For example, values are printed automatically (as HTML), which we cannot change - that's just how it is. Instead, I think we should improve our onboarding docs and maybe include very short Matplotlib and notebook primers (and then refer to the corresponding docs).

Having said that, I like adding new behavior to an existing parameter - this seems to be a very nice solution which doesn't break any old code and hopefully makes things a bit clearer for future code.

mmagnuski commented 1 year ago

@drammock The way I remember the problem with looping over axes when show=True (the default of plot_topomap for example) is that in environment like Spyder with matplotlib inline mode the user will see a figure with all axes blank except the first one. This can be especially puzzling for new users because they would have to know about show argument to get their intended figure, and before they learn this (or switch to the interactive mode) the could spend considerable time trying to figure out what they are doing wrong. So, my general perspective is that the default of show=True causes more problems than benefits and I would prefer show=False was the default, at least for axis-level figures. If what @larsoner suggest is similar to this, then I'm +1 for the change!