pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.38k stars 17.83k forks source link

DOC/VIS: plotting on current figure/axis with Series/DataFrame.plot() #8776

Open jorisvandenbossche opened 9 years ago

jorisvandenbossche commented 9 years ago

From SO, I noticed that Series.plot() plots on the current axis if no ax is specified, and DataFrame.plot not (so creates a new figure when no ax is specified).

Some things about this:

So, if this is deliberate, we should:

or if not:

@TomAugspurger @sinhrks

TomAugspurger commented 9 years ago

This is something that's always bothered me a bit.

I don't think there's a good reason for Series and DataFrame to behave differently. Matplotlib seems to always plot on the currently active axis (there may be exceptions), so we should probably follow that.

tacaswell commented 9 years ago

Just to be clear, pyplot has a notion of 'current' axes, not mpl as a whole.

onesandzeroes commented 9 years ago

Personally I think both should follow DataFrame behaviour of creating a new figure unless ax is explicitly passed. Like tacaswell says, pyplot works based on the currently active axis but once you start dealing with multiple axes, it's much nicer to use the object-oriented approach to creating figures and axes, and pass them explicitly to plotting functions.

As it is, Series.plot() modifies your existing axes without giving you much warning, e.g.:

import pandas as pd
import matplotlib.pyplot as plt

s = pd.Series([4, 5, 6, 7, 8, 9])

# x-axis runs from 0-2
plt.plot([1, 2, 3])
# not anymore!
s.plot()
plt.show()

figure_1

tacaswell commented 9 years ago

The advantage of targeting the 'current axes', rather than making a new figure is that it lets you easily plot multiple lines to the same axes (which something at least I frequently do) without forcing the users to start using the OO api.

sinhrks commented 9 years ago

Agreed to let Series behave like a DataFrame. When we want to plot on the same axes, we can do:

# Assuming "serieses" contains a list of series
ax = None:
for s in serieses:
    s.plot(ax=None)
rcarneva commented 9 years ago

Strongly disagree with the suggestions to make Series behave like the current DataFrame implementation and would much rather have it the other way around. As a frequent user of pandas/mpl in interactive work, I seldom want to have to deal with the OO interface, and not having a default behavior of using plt.gcf()/gca() in what's supposed to be a convenience method makes it much less useful.

Also, for what it's worth, the current behavior is a change from the old behavior in pandas 0.13.x: http://nbviewer.ipython.org/gist/rcarneva/c302ac1ea27304a12957

TomAugspurger commented 9 years ago

From my perspective, the Series behavior of plotting on gca is super useful for interactive work, and I really don't want to get rid of that. I see Series/DataFrame.plot as being mainly for quick and dirty plots, which are usually done interactively. When I need to refine somethingI turn to seaborn / matplotlib and use the OO interface.

So I guess what I'm saying is

tacaswell commented 9 years ago

This is speaking wearing my not-mpl-dev hat, this is not the settled plan of mpl.

One of the things I have been thinking about (but have not fully fleshed out yet) is having mpl provide a decorator/registration function as part of pyplot that takes care of ensuring that the axes object exist (and maybe registering the wrapped function into the same space). The idea would be that you have something like

def pandas_plotting_func(ax, df_or_ds, bunch_of_style):
    ax.foo

in pandas and then to provide interactive plotting you would do something like

plt.register_func(pdplot.pandas_plotting_func)
plt.pandas_plotting_func(df, ...)
plt.pandas_plotting_func(df, ..., ax=ax1)

would all work as expected.

The register function would look something like

def register_func(func):
    @wraps(func)
    def inner(*args, **kwargs):
        if 'ax' in kwargs:
            ax = kwargs.pop('ax', None)
        elif len(args) > 0 and isinstance(args[0], matplotlib.axes.Axes):
            ax = args[0]
            args = args[1:]
        else:
            ax = plt.gca()
        ret = func(ax, *args, **kwargs)
        ax.figure.canvas.draw()  # possible perf hit
        return ret
    # magic to insert function into pyplot namespace
    return inner

That way you can get the best of both worlds, have quick and dirty interactive plotting and be able to re-use those same functions embedded in larger applications.

I would also like to rip most of the core plotting off of the Axes objects and push it to a pattern like this (and expose it through pyplot using this decorator/register function).

cc @danielballan

jorisvandenbossche commented 9 years ago

I agree with @TomAugspurger here that the Series behaviour is certainly usefull for interactive plotting, so I wouldn't change that either. Let's indeed start with documenting better the current situation.

bashtage commented 9 years ago

Personally I think both should follow DataFrame behaviour of creating a new figure unless ax is explicitly passed. Like tacaswell says, pyplot works based on the currently active axis but once you start dealing with multiple axes, it's much nicer to use the object-oriented approach to creating figures and axes, and pass them explicitly to plotting functions.

I completely agree with this. I find it so annoying when I do something like

series.plot()

and nothing appears from an interactive IPython session (except __repr__()). And then I do a

series.to_frame().plot()

to make a window appear.

Similarly when goign exploatory work, I often write things like

df.some_series.plot()

and then nothing. Which leaves me with the options of going back to to_frame() or using

df[['some_series']].plot()
tacaswell commented 9 years ago

Your plot in being dumped into your currently open figure. You may need a call to plt.draw() (pandas may not being doing this automatically under the hood). Another option is to make a call to plt.figure() before plotting a series.

bashtage commented 9 years ago

You may need a call to plt.draw() (pandas may not being doing this automatically under the hood). Another option is to make a call to plt.figure() before plotting a series.

The problem with this process is that pyplot hasn't been imported - and it isn't clear why someone should need to accesspyplot directly to do exploratory, often throw-away work.

tacaswell commented 9 years ago

Pandas is importing pyplot under the hood no matter what.

The pyplot interface is for exploratory work (as opposed to the OO interface).

On Tue, Feb 24, 2015, 22:23 Kevin Sheppard notifications@github.com wrote:

You may need a call to plt.draw() (pandas may not being doing this automatically under the hood). Another option is to make a call to plt.figure() before plotting a series.

The problem with this process is that pyplot hasn't been imported - and it isn't clear why someone should need to accesspyplot directly to do exploratory, often throw-away work.

— Reply to this email directly or view it on GitHub https://github.com/pydata/pandas/issues/8776#issuecomment-75898892.

bashtage commented 9 years ago

Pandas is importing pyplot under the hood no matter what. The pyplot interface is for exploratory work (as opposed to the OO interface).

I understand - my only gripe here is that it should not be necessary to explicitly import pyplot just a pandas plot() method, on either a DataFrame or a Series. Right now, this is only true for a DataFrame

michaelaye commented 9 years ago

I too am in the camp of finding it annoying that a Series does not open a plot window on it's own. Pandas is extraordinary helpful for exploratory data analysis and I just don't see the logic behind having a DataFrame opening a new figure just fine and a Series, which in terms of the data analyst is just another column, would NOT do that. I feel that this throws me off all the time, it just does not make any sense to me, so I would to see this discussed more on both the MPL and Pandas point of views, to help me make sense of it. Edit: I meant above for the nbagg/notebook backend. In inline this works for whatever reason, but I guess this is related.

spencerogden commented 7 years ago

I agree that the inconsistent behavior for .plot() is a gotcha. Personally, I like the DataFrame behavior. Plotting on the same axes is straight forward and intuitive:

ax = pandas.DataFrame(...).plot()
pandas.DataFrame(...).plot(ax=ax)

As opposed to Series, which, with the current behavior, requires to the non-intuitive to_frame() to get the DataFrame like behavior. While this works and is not a lot of extra typing, I don't think this is a trick users discover on their own. And it certainly isn't clear from the docs how to get this behavior.

pandas.Series(...).plot()
pandas.Series(...).to_frame().plot() # Surprising

At a minimum, the documentation should be amended: DataFrame.plot() ax : matplotlib axes object, default None, resulting in the plot appearing on a new figure.

Series.plot() ax : matplotlib axes object, default gca(), resulting in the plot appearing on the last drawn figure.

tacaswell commented 7 years ago

Responding (very late) to @michaelaye it 'works' with inline because it does not keep live figures around, thus the next call to gca has to make a new figure. No other backend works this way, the behavior of 'nbagg' is the normal case.