mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.59k stars 1.93k forks source link

Wrong counts plotted in Seaborn bar plots in V0.13.2 and V0.11.2 #3760

Closed RachellyN closed 1 month ago

RachellyN commented 1 month ago

Hi! I'm trying to create a stacked bar plot from the given file that includes counts of 6 categories across 4 patients. The barplot that seaborn is creating is totally wrong - in the amount of total counts per patients as well as the counts per category. In the example code below I show this, and how a barplot in pandas creates the plot correctly. This has happened in 2 different seaborn versions: V0.13.2 and V0.11.2 (matplotlib version is 3.7.5 in both cases).

Here is the data (df in the code), also available in the attached file:

       Patient Category  count
0   Patient_1b       X1   3852
1   Patient_2a       X4   2946
2   Patient_1a       X4   2020
3   Patient_1b       X2   1587
4   Patient_1a       X1   1353
5   Patient_1a       X2   1024
6   Patient_1a       X5    520
7   Patient_2a       X5    489
8   Patient_1a       X6    486
9   Patient_1a       X3    272
10  Patient_2a       X6    194
11  Patient_1b       X3    119
12  Patient_1b       X4     96
13  Patient_1b       X5     95
14  Patient_2a       X1     49
15  Patient_2b       X1     44
16  Patient_2a       X2     27
17  Patient_2b       X5     20
18  Patient_2b       X6     18
19  Patient_2a       X3     17
20  Patient_1b       X6     16
21  Patient_2b       X4     15
22  Patient_2b       X3      4
23  Patient_2b       X2      3

Here is the example code:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv("seaborn_counts.csv")

# plot stacked barplot with Seaborn
plt.clf()
sns.barplot(data=df, x="Patient", y="count", hue="Category", dodge=False)
plt.show()

# Counts don't match what's in the plot!
df.loc[df["Patient"]=="Patient_1b","count"].sum() # 5,765
df.loc[df["Patient"]=="Patient_2a","count"].sum() # 3,722
df.loc[df["Patient"]=="Patient_1a","count"].sum() # 5,675

# plot stacked barplot with Seaborn
df_pivot = pd.pivot_table(df, values='count', index='Patient', columns='Category')
plt.clf()
df_pivot.plot.bar(stacked=True)

Seaborn plot: image

Pandas plot: image

seaborn_counts.csv

mwaskom commented 1 month ago

Your seaborn plot is not using stacked bars. When you set dodge=False the bars will be layered on top of each other.

RachellyN commented 1 month ago

Thank you @mwaskom ! Is it possible to plot a stacked barplot with seaborn.barplot or does it require a different kind of plot type?

mwaskom commented 1 month ago

barplot doesn't' support stacking, sorry!