mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.7k stars 1.31k forks source link

Plots appear twice in notebooks #11165

Closed cbrnr closed 1 year ago

cbrnr commented 2 years ago

Every call to a plotting function produces two plots in a notebook, e.g.:

Screen Shot 2022-09-15 at 15 16 29

This is because our functions return the figure, which then automatically gets plotted in a notebook. In addition, we also see the plot which is explicitly produced by the function.

I don't think _ = XXXX.plot() or XXXX.plot(); are good solutions (pretty ugly). The only way this works nicely is XXXX.plot(show=False).

Could we somehow fix this weird behavior? By weird I mean (1) there are either two figures (show=True) or (2) there is one figure but show=False sounds like there should be none.

What about setting show=False by default only in a notebook environment?

drammock commented 2 years ago

to me this feels like something that should be fixed upstream. Maybe IPython should suppress displaying the output of cells if that output is a matplotlib Figure instance. Or maybe there could be an IPython extension that would do this. But to me this is not something we should fix within MNE-Python.

cbrnr commented 2 years ago

I think we are at least partly to blame. If you do

import matplotlib.pyplot as plt
plt.plot(1)

in a notebook, you only see the figure once, because the plotting function does not return the figure, but some sort of line collection. This is also what they recommend (plot does not return a Figure), so I think it's also the returning a Figure part that's causing this issue specifically.

Screen Shot 2022-09-15 at 15 44 19
larsoner commented 2 years ago

so I think it's also the returning a Figure part that's causing this issue specifically.

Yes, and this also seems like a totally reasonable pattern, and has been used a lot historically:

plt.figure()
plt.plot(...)
plt.show()

if this produces two figure displays, it's not totally clear to me that this is desired behavior or not. We should probably at least ask the MPL folks about it. If they say "yes in that case you should get two plots and we won't/can't change this" then changing all our show=True to show='auto' to mean "False in Notebooks, True otherwise" could make sense but

  1. We'd need to somehow detect which we're in. Maybe you can get it from the fig.canvas easily... but we can figure that out later if we need to.
  2. If someone does fig = mne.viz..., I think nothing will display because no fig gets _repr_html_ed and also no show actually gets called because our show='auto' would detect the fig to be Notebook-type :(

(If doing both plt.figure() and plt.show() you still only get one figure, then we should figure out why our figures don't work like a plain plt figure, but I'm assuming this is not the case.)

cbrnr commented 2 years ago

(If doing both plt.figure() and plt.show() you still only get one figure, then we should figure out why our figures don't work like a plain plt figure, but I'm assuming this is not the case.)

Just one figure, yes. Like I mentioned, the difference is that MPL functions do not return a Figure.

cbrnr commented 2 years ago

Actually, MNE is the only package I know that behaves like that and produces two figures. All others produce just one figure because they do not return a Figure (e.g. MPL, pandas, seaborn, ...).

larsoner commented 2 years ago

All others produce just one figure because they do not return a Figure (e.g. MPL, Seaborn, ...).

Some mpl functions do, including plt.figure, which is why I included it. But I shouldn't have included the show as it does prevent from showing the problem, which is visible without it:

Screen Shot 2022-09-15 at 7 06 44 AM
cbrnr commented 2 years ago

Right, but this seems to make sense, because plt.figure() should return a Figure.

cbrnr commented 2 years ago

So in my opinion the problem is that plotting functions should not return a Figure, but some axis subtype or line collection or similar.

cbrnr commented 2 years ago

Also, you would not just do plt.figure(), but rather assign it a name, e.g. fig = plt.figure(). But you do use plt.plot() without any assignment.

drammock commented 2 years ago

So in my opinion the problem is that plotting functions should not return a Figure, but some axis subtype or line collection or similar.

what our plotting functions should return depends a lot on what the user wants. Here, you want a clean notebook experience where you don't have to write semicolons or assign to underscores. Other users want to be able to easily customize the figures that we generate, e.g., by adding insets, changing margins or titles, etc. Supposing we returned an axes object, they would have to type an additional .get_figure() every time (compared to your pain point of having to type ; to suppress output). Which is worse?

Another consideration is that if we return the figure, we can have better consistency across our plotting functions (no matter how many subplots we generate, we always return one thing). (there are still a couple of exceptions to this, I think, where we return a list of figures, but I'm working on eliminating them.) That predictability is IMO useful to our users, it means fewer times where they have to look up what our plotting function returns. (There are trade-offs, of course; if the user wants the axes handle they'll have to type more.)

A third consideration is the vast amount of user code that would break if our plotting functions stopped returning figures.

A final point I'd like to raise is that, with matplotlib, the preferred interface is the object-oriented interface, which means you create the figure (and axes) first before plotting into them. That pattern ensures that you already have a handle to the figure. For MNE-Python, expecting users to do that for something like ica.plot-properties or evoked.plot_joint is IMO too much to expect, which to me suggests that we should be returning a figure handle to them.

cbrnr commented 2 years ago

what our plotting functions should return depends a lot on what the user wants. Here, you want a clean notebook experience where you don't have to write semicolons or assign to underscores. Other users want to be able to easily customize the figures that we generate, e.g., by adding insets, changing margins or titles, etc. Supposing we returned an axes object, they would have to type an additional .get_figure() every time (compared to your pain point of having to type ; to suppress output). Which is worse?

Sure there are different use cases. I would argue that 95% of users produce one-off plots and do not want to manually edit the figure afterwards. But of course I don't know, I have no statistics on that. However, if I'm correct, then having to do .get_figure() for cases where you do want to edit doesn't seem so bad.

Re consistency, if all our plotting functions return e.g. axis, they are also consistent. In addition, we would be even more consistent to other packages that also produce plots. None of these return figures.

Re breaking code: yep, I agree.

Re object-oriented interface, I agree too, but e.g. df.plot() in pandas also produces a plot (figure) without returning and without having to use the OO interface. So this is not really an issue IMO.

At the end, I think our code base is already so large that we probably don't want to change the return type, but maybe we could think about show="auto" that behaves differently in a notebook vs. any normal REPL environment?

drammock commented 2 years ago

if all our plotting functions return e.g. axis, they are also consistent.

we cannot achieve that though. Many of our plotting functions generate complex figures with many subplots. Which axes do we return? You might say "OK then a list", but do we guarantee the order of items in that list? (and if we don't, then IMO it's pretty hard to use the return value). OK then maybe a dict of axes? One of our topomap plotting functions generates 25 axes by default; also returning dict-of-axes requires that we come up with a key-naming scheme, and enforcing that our devs follow it.

IMO comparisons to plt.plot() or DataFrame.plot() are inappropriate; those generate one figure, one axes and it's pretty straightforward to return the axes object. Our plots are not like that.

At the end, I think our code base is already so large that we probably don't want to change the return type, but maybe we could think about show="auto" that behaves differently in a notebook vs. any normal REPL environment?

This has the problem @larsoner mentioned above: it easily yields situations where no plot is displayed in the notebook:

If someone does fig = mne.viz..., I think nothing will display because no fig gets _repr_html_ed and also no show actually gets called because our show='auto' would detect the fig to be Notebook-type :(

Another option we could maybe consider: we could always return a subclass (MNEFigure or such) that inherits from matplotlib Figure but overrides its _repr_html_. In fact MNEFigure already exists in order to sometimes suppress the MPL toolbar and alter some of its default keyboard shortcuts; extending it to all of our MPL figures is on my low-priority long-range TODO list anyway, and I'd be fine if it changed the _repr_html_ too.

cbrnr commented 2 years ago

I like the idea of returning an MNEFigure and change its _repr_html_ accordingly. But how would this solve the issue when someone assigns fig = mne.viz...?

cbrnr commented 2 years ago

Re comparisons with plt.plot(), seaborn has functions that plot more complex figures (with multiple axes) that still do not return a figure (e.g. https://seaborn.pydata.org/examples/many_facets.html), but do it through some container type. It's probably overkill for us, but they don't have the problem of figures showing up twice in notebooks...

drammock commented 2 years ago

seaborn has functions that plot more complex figures (with multiple axes) that still do not return a figure, but do it through some container types.

Good point. Still, I think it's not quite the same; seaborn is typically plotting small-multiples, so a container makes sense / can work for a wide variety of their plots. For us, not only is a container class "overkill" (as you say) but also probably inappropriate (our plots are too variable in what they include to make a container a very useful abstraction, IMO --- we might as well just return a dict at that point).

I like the idea of returning an MNEFigure and change its _repr_html_ accordingly. But how would this solve the issue when someone assigns fig = mne.viz...?

if show=True (which is almost always the default) then the notebook ought to capture the output and display it. (that is not the case with the suggested show='auto' approach)

agramfort commented 2 years ago

I am with @drammock on this one. I've lived fine some _ = and ; and I have never heard students complain about this during trainings. Having to deal with all the deprecations for me largely limits my enthusiasm to consider any change... Given the time and ressources we have to maintain and improve MNE I would allocate of time on this.

my 2c

cbrnr commented 2 years ago

@agramfort yes, I agree that it is too big of a change RN (and I did hear some people wonder about _ = and ; so I'd avoid these whenever possible).

But we could still try to improve things by implementing what @drammock suggested - would you be OK with that? Only I'm not yet convinced how it will work... Are you saying that in a notebook, we could write e.g. raw.plot() and only one plot would appear? Same for fig = raw.plot()? If this works, then the problem is solved, isn't it?

drammock commented 2 years ago

@cbrnr you can test if it works by adding a repr_html here:

https://github.com/mne-tools/mne-python/blob/main/mne/viz/_mpl_figure.py#L83

and trying raw.plot() in a notebook after that.

agramfort commented 2 years ago

I tend to repeat all the time "if it's not broken don't fix it". Here to me it's not broken but it may be a subjective opinion. If I have some time to review mne things I would rather focus on something more pressing but I won't prevent anyone to spend their time on this.

Message ID: @.***>

cbrnr commented 2 years ago

@agramfort it is broken to me, I don't want to explain to students why there are two identical outputs. This is very confusing and a bad UX. If it's easy to fix, we should fix it. If it's hard to fix, I won't spend more time on that either. OK?

@drammock and what would _repl_html_() do and return in this case? I'll play around with it a bit.

drammock commented 2 years ago

what would _repl_html_() do and return in this case?

for testing you could just make it pass. But probably it should return a regular __repr__ output (text like <MNEFigure...>)

hoechenberger commented 2 years ago

@agramfort it is broken to me, I don't want to explain to students why there are two identical outputs. This is very confusing and a bad UX. If it's easy to fix, we should fix it. If it's hard to fix, I won't spend more time on that either. OK?

I agree, IMHO it's broken and it's confused me and my students/colleagues. Adding the semicolon is weird, I never have to do this anywhere but in MNE. It's not a big thing but it's certainly unexpected.

  • We'd need to somehow detect which we're in. Maybe you can get it from the fig.canvas easily... but we can figure that out later if we need to.

I believe @GuillaumeFavelier added some Notebook detection to some of the 3D plotting code

cbrnr commented 2 years ago

I agree, IMHO it's broken and it's confused me and my students/colleagues. Adding the semicolon is weird, I never have to do this anywhere but in MNE. It's not a big thing but it's certainly unexpected.

It's not big, but it gets even more annoying if you are writing tutorials/course material with Jupyter notebook, which I think a lot of people are doing (including me via Quarto). It can be worked around, but it's cumbersome. I think it is the way we are plotting figures that is non-standard and like I said, no other package I know is doing it like this.

@drammock adding a _repr_html_() method doesn't work for me (I tried in VS Code). I still get two figures with show=True and one with show=False.

hoechenberger commented 2 years ago

but it gets even more annoying if you are writing tutorials/course material with Jupyter notebook, which I think a lot of people are doing

+1

drammock commented 2 years ago

It can be worked around, but it's cumbersome

Sorry, but it is literally one single keystroke. I think "cumbersome" is a bit hyperbolic here.

I've spent quite some time today looking into this; the issue is deeper than I suspected (as you've already discovered, overwriting _repr_html_ is not sufficient). It is informative to define _repr_html_ as something like print("foo") and to play around with a notebook using %matplotlib ipympl. You'll see that you get one output that is interactive, and another one that is a static PNG. AFAICT, something (the backend? a FigureManager? the notebook itself? I can't tell exactly what) is keeping track of figure objects generated in each cell and outputting them as base-64 encoded PNG data regardless of what their _repr_html_ definition is.

If you don't overwrite _repr_html_ then the backend is sending figures through a tornado template which calls javascript. I don't think that's where the extra, unwanted output comes from. Maybe it's the backend's show() function which (IIUC) sends base64-encoded PNGs through a websocket to the notebook (this is consistent with what I see in the browser console: an img tag with base-64 encoded PNG data as its src).

You'll have to take it from here, @cbrnr. I've spent too long on this already.

cbrnr commented 2 years ago

Thanks @drammock, that's already a lot of information to chew on. But the workaround is not just a single keystroke. I'm talking about using a notebook to write course material, assuming people will use the normal REPL. When I append a ;, then yes, it's a single keystroke, but I have to explain why this is necessary here, why it is not Pythonic, why it is not needed in the REPL etc. Then my workaround gets more than a keystroke: I have to write two cells, one without the ; (the code that people should copy) that isn't evaluated, just printed, and then another cell with the ; that is evaluated (to avoid the two figures), but is not echoed.

Let's leave this open for now, I'll see if I can think of other solutions, but I still suspect that because no other package returns a figure, this would be the obvious solution.

hoechenberger commented 2 years ago

@cbrnr A "pythonic" workaround I'm using is that I'm assigning the figure to a variable; this avoids having to add a semicolon and even opens up various possibilities for post-hoc modifications / processing because now you have a proper MPL object that you can manipulate freely

cbrnr commented 2 years ago

If I'm calling a plot function I'm expecting that exactly one plot appears. Everything else is a workaround. If you really need to keep the figure for later, then yes, but the standard _ = already indicates the opposite.

hoechenberger commented 2 years ago

I do agree with you and I'm not trying to convince you that you're wrong. I'm just trying to give you a suggestion as to how you could communicate things to your students and turn this flaw into a feature. Instead of saying, "MNE is doing this weird thing, you'll now have to use a semicolon, ugh, it's ugly and unpythonic but that's just what you'll have to use, sorry!" turn it around and say, "MNE aims to give the user much flexibility and encourages you to save the produced figures to disk to keep track of your analysis results; therefore, always assign the returned figure to a variable for maximum flexibility, and this will also take care of the double-figure issue".

cbrnr commented 2 years ago

I think I found a nice solution. What if we didn't return a Figure, but a small custom object which contains the Figure as an attribute? This would solve the issue with the duplicate plot output in notebooks, and users would still be able to access the figure if they want.

Example (run in a notebook to verify the output):

import matplotlib.pyplot as plt

class MyFigure:
    """Wrapper for Figure."""
    def __init__(self, figure):
        self.figure = figure

def plot():
    """Some plotting function."""
    fig, ax = plt.subplots()
    ax.plot(1)
    return MyFigure(fig)

This now produces just one plot:

plot()

If we need the figure, we can still do:

f = plot()

And we have the figure available as f.figure.

Note that this doesn't work with the class we already have (MNEFigure), because it inherits from matplotlib.figure.Figure. Therefore, we could even repurpose this class and put everything into a figure attribute.

larsoner commented 2 years ago

It's a pretty bad backward compat break. I have tens or hundreds of scripts with fig = mne.viz.whatever(...) in them where I then modify fig or fig.axes etc., and I'm sure I'm not alone

cbrnr commented 2 years ago

I think it would be worth it. Of course with a proper (maybe even extended) deprecation cycle.

drammock commented 2 years ago

I think it would be worth it.

Scores of researchers needing to edit tens of scripts each? I'm not convinced.

Did you have any luck figuring out why it happens in the first place? Did you try asking the Jupyter or IPython devs? Until we've at least attempted to see if an upstream fix is possible / would be accepted, I'm -1 on changing our behavior.

hoechenberger commented 2 years ago

I'm not convinced either, it's too much breakage for too little gain. Let's see if we cannot write a Jupiter plugin instead that handles this behavior gracefully?

cbrnr commented 2 years ago

Scores of researchers needing to edit tens of scripts each? I'm not convinced.

If they always want to use the latest MNE version, they'd have at least three versions to upgrade. If they are pinning their package versions, which they should if they are trying to be reproducible, their scripts will always work.

But I get it, I'm also almost always using the latest versions of everything, and it would be nice to not break stuff unless it is important. I think that's where we disagree.

Did you have any luck figuring out why it happens in the first place? Did you try asking the Jupyter or IPython devs? Until we've at least attempted to see if an upstream fix is possible / would be accepted, I'm -1 on changing our behavior.

I have not asked the Jupyter or Matplotlib devs yet, because I think the issue is on our end:

I will ask the Matplotlib devs first to see what they have to say about this topic. Maybe there's another simpler solution.

I'm not convinced either, it's too much breakage for too little gain. Let's see if we cannot write a Jupiter plugin instead that handles this behavior gracefully?

Let's not try to fix something that's probably wrong in the first place like this. I'd rather fix the underlying problem, then we don't need a plugin. However, we'll see what the Matplotlib devs say, maybe our approach is fine and this should be fixed either in Matplotlib or Jupyter.

drammock commented 2 years ago

You're omitting an important finding of the work I put in already trying to troubleshoot this:

It is informative to define _repr_html_ as something like print("foo") and to play around with a notebook using %matplotlib ipympl. You'll see that you get one output that is interactive, and another one that is a static PNG.

To me that seems like a bug, and one that should be fixed upstream. Or at least that behavior should be made configurable.

cbrnr commented 2 years ago

I asked on the Matplotlib Discourse, let's see what happens:

https://discourse.matplotlib.org/t/custom-plot-function-in-jupyter-notebooks/23185

Also note that replacing _repr_html_() does work in this simple example, so maybe I made a mistake when I tried doing this for one of our plot functions.

mmagnuski commented 1 year ago

Just for reference, this problem was previously discussed here: https://github.com/mne-tools/mne-python/issues/8424

hoechenberger commented 1 year ago

Just for reference, this problem was previously discussed here: https://github.com/mne-tools/mne-python/issues/8424

Haha it's kinda funny to see how the perception of what is and what is not correct behavior has changed over the past two years :)

drammock commented 1 year ago

the perception of what is and what is not correct behavior has changed over the past two years

has it? 2 years ago I said this:

  1. having our plot functions return a figure instance is what MPL devs recommend
  2. it's an intentional design decision of the inline backend to both show every open figure created in a cell (in addition to showing the "return value" of the last line of the cell, which is built into ipython but can be turned off in settings)
  3. using an assignment or semicolon on the last line is the officially recommended workaround.

and I think that's all still borne out by the recent MPL discourse thread.

larsoner commented 1 year ago

Closing in favor of #11165 since that has proposals to fix/address the issue

larsoner commented 1 year ago

... I meant #11528