Open datapythonista opened 5 years ago
may i have a try on this if some agreements are reached by maintainers?
I'll be creating new issues for the tasks that result from this issue, but you're surely welcome to work on those.
I just ran across this and see that it is fairly recent, hooray! Might I make an observation of something to consider as this rework is done? Grouped histograms. There are currently 8 native ways to do this, including the "redundant" .hist
and .plot.hist
and depending on what kind of object you're calling it from. Almost none of them behave the same and some don't behave in any expected way (no grouping).
I am concerned because if the plan is to drop Series.hist
and DataFrame.hist
, (which is fine by me actually I don't like API redundancy), it is worth noting that grouping using the by=
keyword to Series.plot.hist
and DataFrame.plot.hist
does not work. I've compiled all the behaviors in this spreadsheet (file should be available here).
Is this the right place to bring it up or should I make a separate issue on grouping behavior?
Does https://github.com/pandas-dev/pandas/issues/11053 already contain everything?
On Mon, Sep 9, 2019 at 2:42 PM Andrea Smith notifications@github.com wrote:
I just ran across this and see that it is fairly recent, hooray! Might I make an observation of something to consider as this rework is done? Grouped histograms. There are currently 8 native ways to do this, including the "redundant" .hist and .plot.hist and depending on what kind of object you're calling it from. Almost none of them behave the same and some don't behave in any expected way (no grouping).
I am concerned because if the plan is to drop Series.hist and DataFrame.hist, (which is fine by me actually I don't like API redundancy), it is worth noting that grouping using the by= keyword to Series.plot.hist and DataFrame.plot.hist does not work. I've compiled all the behaviors in this spreadsheet (file should be available here https://drive.google.com/file/d/1jVCxw_zshguW4Xfwi8NYWDptCQXnpWGt/view?usp=sharing ).
Is this the right place to bring it up or should I make a separate issue on grouping behavior?
— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28177?email_source=notifications&email_token=AAKAOIXDSDHD64BH4PPK2ZLQI2RJHA5CNFSM4IQFLFHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6IZMKA#issuecomment-529634856, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIVJDXWPFXOI6HESNHTQI2RJHANCNFSM4IQFLFHA .
@TomAugspurger Well it looks like that's along the right track, for sure ... that .hist
and .plot.hist
do very different things. I don't see any mention of the 'by=
keyword, though, which is what I'm concerned about. For example, the following:
import numpy as np
import pandas as pd
np.random.seed(159753)
df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])
df['C'] = np.random.choice(['a', 'b', 'c'], 30)
df.hist(column='A', by='C')
df.plot.hist(column='A', by='C')
Yield these two plots: (1) (2)
As you can see the first one uses subplots, but the second one just plots the whole histogram of 'A' with no grouping whatsoever.
It looks like one way of achieving the .hist
type result with .plot.hist
is to make the subplots
keyword functional (right now it doesn't do anything in this particular situation).
Actually, I just found issue #15079 which I think is most closely related, methinks. Sorry for cluttering the space here. However, since actually implementing by
in the plot.hist
pathway was kicked down the road at that point over a year ago, now seems a good time to get it done if hist
is really going to be deprecated.
@datapythonista would it make sense to make checkboxes in the top post to clarify what the status of this issue is?
I opened this to have a discussion and see if people was happy with my proposed changes. But I don't think there has been any discussion or any progress on this. So probably not worth having the checkboxes for now.
Hello, is there any planned change on pandas.plotting? I see this issue is from 2019 but open.
xref: #26747, #28159
The current plotting API feels inconsistent and I think it's the one we have for historical reasons, and not the one we want. I propose the next changes:
Leave the current API based on the
.plot
accessor as is (e.g.Series.plot.hist
,Series.plot(kind='box')
). In the future we should consider two things:**kwargs
Remove all duplicate functions:
Series.hist
->Series.plot.hist
)DataFrame.hist
->DataFrame.plot.hist
)pandas.plotting.boxplot
->DataFrame.plot.box
)DataFrame.boxplot
->DataFrame.plot.box
)Move the matplotlib backend to a separate project (
pandas.plotting._matplotlib
->pandas_matplotlib
)Move to the matplotlib backend the non-accessor plotting functions:
pandas.plotting.andrews_curves
->pandas_matplotlib.andrews_curves
)Move to the matplotlib backend register/unregister of the converters (pandas.plotting.register
->
pandas_matplotlib.register`)CC: @pandas-dev/pandas-core @jakevdp