pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.19k stars 17.77k forks source link

DEPR: Clean up of pandas.plotting #28177

Open datapythonista opened 5 years ago

datapythonista commented 5 years ago

xref: #26747, #28159

The current plotting API feels inconsistent and I think it's the one we have for historical reasons, and not the one we want. I propose the next changes:

  1. Leave the current API based on the .plot accessor as is (e.g. Series.plot.hist, Series.plot(kind='box')). In the future we should consider two things:

    • Whether we want backend to be able to add plots that we don't define
    • Move all the matplotlib specific parameters to **kwargs
  2. Remove all duplicate functions:

    • hist_series (Series.hist -> Series.plot.hist)
    • hist_frame (DataFrame.hist -> DataFrame.plot.hist)
    • boxplot (pandas.plotting.boxplot -> DataFrame.plot.box)
    • boxplot_frame (DataFrame.boxplot -> DataFrame.plot.box)
  3. Move the matplotlib backend to a separate project (pandas.plotting._matplotlib -> pandas_matplotlib)

  4. Move to the matplotlib backend the non-accessor plotting functions:

    • andrews_curves (pandas.plotting.andrews_curves -> pandas_matplotlib.andrews_curves)
    • autocorrelation_plot
    • bootstrap_plot
    • lag_plot
    • parallel_coordinates
    • radviz
    • scatter_matrix
    • table
  5. Move to the matplotlib backend register/unregister of the converters (pandas.plotting.register->pandas_matplotlib.register`)

CC: @pandas-dev/pandas-core @jakevdp

charlesdong1991 commented 5 years ago

may i have a try on this if some agreements are reached by maintainers?

datapythonista commented 5 years ago

I'll be creating new issues for the tasks that result from this issue, but you're surely welcome to work on those.

Jeitan commented 4 years ago

I just ran across this and see that it is fairly recent, hooray! Might I make an observation of something to consider as this rework is done? Grouped histograms. There are currently 8 native ways to do this, including the "redundant" .hist and .plot.hist and depending on what kind of object you're calling it from. Almost none of them behave the same and some don't behave in any expected way (no grouping).

I am concerned because if the plan is to drop Series.hist and DataFrame.hist, (which is fine by me actually I don't like API redundancy), it is worth noting that grouping using the by= keyword to Series.plot.hist and DataFrame.plot.hist does not work. I've compiled all the behaviors in this spreadsheet (file should be available here).

Is this the right place to bring it up or should I make a separate issue on grouping behavior?

TomAugspurger commented 4 years ago

Does https://github.com/pandas-dev/pandas/issues/11053 already contain everything?

On Mon, Sep 9, 2019 at 2:42 PM Andrea Smith notifications@github.com wrote:

I just ran across this and see that it is fairly recent, hooray! Might I make an observation of something to consider as this rework is done? Grouped histograms. There are currently 8 native ways to do this, including the "redundant" .hist and .plot.hist and depending on what kind of object you're calling it from. Almost none of them behave the same and some don't behave in any expected way (no grouping).

I am concerned because if the plan is to drop Series.hist and DataFrame.hist, (which is fine by me actually I don't like API redundancy), it is worth noting that grouping using the by= keyword to Series.plot.hist and DataFrame.plot.hist does not work. I've compiled all the behaviors in this spreadsheet (file should be available here https://drive.google.com/file/d/1jVCxw_zshguW4Xfwi8NYWDptCQXnpWGt/view?usp=sharing ).

Is this the right place to bring it up or should I make a separate issue on grouping behavior?

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28177?email_source=notifications&email_token=AAKAOIXDSDHD64BH4PPK2ZLQI2RJHA5CNFSM4IQFLFHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IZMKA#issuecomment-529634856, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIVJDXWPFXOI6HESNHTQI2RJHANCNFSM4IQFLFHA .

Jeitan commented 4 years ago

@TomAugspurger Well it looks like that's along the right track, for sure ... that .hist and .plot.hist do very different things. I don't see any mention of the 'by= keyword, though, which is what I'm concerned about. For example, the following:

import numpy as np
import pandas as pd

np.random.seed(159753)

df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])
df['C'] = np.random.choice(['a', 'b', 'c'], 30)

df.hist(column='A', by='C')
df.plot.hist(column='A', by='C')

Yield these two plots: (1) pdh_df_hist_by (2) pdh_df_plothist_by

As you can see the first one uses subplots, but the second one just plots the whole histogram of 'A' with no grouping whatsoever.

Jeitan commented 4 years ago

It looks like one way of achieving the .hist type result with .plot.hist is to make the subplots keyword functional (right now it doesn't do anything in this particular situation).

Jeitan commented 4 years ago

Actually, I just found issue #15079 which I think is most closely related, methinks. Sorry for cluttering the space here. However, since actually implementing by in the plot.hist pathway was kicked down the road at that point over a year ago, now seems a good time to get it done if hist is really going to be deprecated.

jbrockmendel commented 4 years ago

@datapythonista would it make sense to make checkboxes in the top post to clarify what the status of this issue is?

datapythonista commented 4 years ago

I opened this to have a discussion and see if people was happy with my proposed changes. But I don't think there has been any discussion or any progress on this. So probably not worth having the checkboxes for now.

stelios-c commented 1 month ago

Hello, is there any planned change on pandas.plotting? I see this issue is from 2019 but open.