pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.62k stars 17.91k forks source link

Stacked bar plot negative values do not work correctly if dataframe contains NaN values #8175

Closed tom-alcorn closed 10 years ago

tom-alcorn commented 10 years ago

While trying to produce a stacked bar plot which includes negative values, I found that if the dataframe contains NaN values the bar plot does not display correctly.

Specifically, this code:

df = pd.DataFrame([[10,20,5,40],[-5,5,20,30],[np.nan,-10,-10,20],[10,20,20,-40]], columns = ['A','B','C','D'])
df.plot(kind = 'bar', stacked=True); plt.show();

incorrectly produces this plot

screen shot 2014-09-04 at 1 38 47 pm Notice that at '2' on the x-axis, there should be a bar of size -10 for each of the 'B' and 'C' categories.

However, when I replace the NaN values with 0s by doing

df = pd.DataFrame([[10,20,5,40],[-5,5,20,30],[np.nan,-10,-10,20],[10,20,20,-40]], columns = ['A','B','C','D'])
df = df.fillna(0)
df.plot(kind = 'bar', stacked=True); plt.show();

then the plot displays correctly

screen shot 2014-09-04 at 1 41 41 pm

This is clearly not a good behaviour. I suspect that this happens because the bars corresponding to the negative values are trying to use np.nan as their 'bottom' argument and thus not displaying at all, but I haven't investigated further.

It would be nice if area-style plots like this would either automatically replace NaN values with 0 or throw an error about NaN values present in the dataframe causing problems for the plotting functions.

TomAugspurger commented 10 years ago

cc @sinhrks.

Thanks for the report. I think fillna(0) is the intended behavior (that's what AreaPlot and PiePlot both do).

tom-alcorn commented 10 years ago

No problems, glad to help.

On Thu, Sep 4, 2014 at 2:13 PM, Tom Augspurger notifications@github.com wrote:

cc @sinhrks. Thanks for the report.

I think fillna(0) is the intended behavior (that's what AreaPlot and PiePlot both do).

Reply to this email directly or view it on GitHub: https://github.com/pydata/pandas/issues/8175#issuecomment-54520622