mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.5k stars 1.92k forks source link

non-deterministic FacetGrid.map_dataframe #3772

Closed graingert-coef closed 1 hour ago

graingert-coef commented 3 hours ago

demo:

import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = b""",Job Role,Gender,Outcome ,normalised_selection_rate
0,HR Business Partner,Female,0,0.0
1,HR Business Partner,Male,0,0.0
2,HR Business Partner,Male,0,0.0
3,HR Business Partner,Female,0,0.0
4,HR Business Partner,Male,0,0.0
5,HR Business Partner,Male,0,0.0
6,HR Business Partner,Female,1,20.0
7,HR Business Partner,Male,1,20.0
8,HR Business Partner,Male,0,0.0
9,HR Business Partner,Female,1,20.0
10,HR Business Partner,Female,0,0.0
11,HR Business Partner,Female,0,0.0
12,HR Business Partner,Female,0,0.0
13,HR Business Partner,Female,0,0.0
14,HR Business Partner,Female,0,0.0
15,HR Business Partner,Female,0,0.0
16,HR Business Partner,Male,0,0.0
17,HR Business Partner,Male,1,20.0
18,HR Business Partner,Male,0,0.0
19,HR Business Partner,Female,1,20.0
20,Senior Finance Business Partner,Female,0,0.0
21,Senior Finance Business Partner,Male,1,33.33333333333333
22,Senior Finance Business Partner,Male,0,0.0
23,Senior Finance Business Partner,Female,0,0.0
24,Senior Finance Business Partner,Female,0,0.0
25,Senior Finance Business Partner,Female,1,33.33333333333333
26,Senior Finance Business Partner,Female,0,0.0
27,Senior Finance Business Partner,Male,0,0.0
28,Senior Finance Business Partner,Female,0,0.0
29,Senior Finance Business Partner,Female,0,0.0
30,Senior Finance Business Partner,Male,1,33.33333333333333
31,Senior Finance Business Partner,Female,0,0.0
32,Senior Finance Business Partner,Female,0,0.0
33,Senior Finance Business Partner,Male,0,0.0
34,Senior Finance Business Partner,Male,0,0.0
35,Senior Finance Business Partner,Male,0,0.0
36,Senior Finance Business Partner,Female,0,0.0
37,Senior Finance Business Partner,Male,0,0.0
38,Senior Finance Business Partner,Female,0,0.0
39,Senior Finance Business Partner,Female,0,0.0
40,Retail Store Manager,Female,1,10.0
41,Retail Store Manager,Male,0,0.0
42,Retail Store Manager,Male,0,0.0
43,Retail Store Manager,Female,1,10.0
44,Retail Store Manager,Male,0,0.0
45,Retail Store Manager,Female,0,0.0
46,Retail Store Manager,Male,0,0.0
47,Retail Store Manager,Female,0,0.0
48,Retail Store Manager,Male,1,10.0
49,Retail Store Manager,Male,1,10.0
50,Retail Store Manager,Female,1,10.0
51,Retail Store Manager,Male,0,0.0
52,Retail Store Manager,Male,1,10.0
53,Retail Store Manager,Female,0,0.0
54,Retail Store Manager,Male,1,10.0
55,Retail Store Manager,Male,0,0.0
56,Retail Store Manager,Female,0,0.0
57,Retail Store Manager,Male,1,10.0
58,Retail Store Manager,Male,1,10.0
59,Retail Store Manager,Female,1,10.0
"""

def main():
    groupby = "Gender"
    job = "Job Role"
    plot_data = pd.read_csv(io.BytesIO(data))
    df = plot_data
    num_of_groups = len(df[groupby].unique())
    cmap = plt.get_cmap("Set3", num_of_groups)
    colours = [cmap(i) for i in np.linspace(0, 1, num_of_groups)]
    color_dict = dict(zip(sorted(df[groupby].unique()), colours, strict=True))

    # Create faceted plot
    g = sns.FacetGrid(
        plot_data,
        col=job,
        col_wrap=2,
        height=4,
        aspect=1,
    )

    # Add bars to each subplot
    g.map_dataframe(
        sns.barplot,
        x=groupby,
        y="normalised_selection_rate",
        hue=groupby,
        palette=color_dict,
        legend=False,
    )

    return float(g.axes.flat[0].get_ylim()[1])

while True:
    try:
        print(main())
    finally:
        plt.close()

running this prints:

21.874999999999996
21.874999999999996
17.499999999999996
21.874999999999996
17.499999999999996
21.874999999999996
21.874999999999996
21.874999999999996
21.874999999999996
17.609374999999897
21.874999999999996
21.874999999999996
17.499999999999996
17.499999999999996
...
mwaskom commented 2 hours ago

The errorbars in bar plot are computed using a bootstrap by default. There is a seed parameter you can set if you want it to be reproducible.

graingert-coef commented 1 hour ago

@mwaskom thanks! what's the seed parameter?

mwaskom commented 1 hour ago

It's seed=.

graingert commented 21 minutes ago

Ah the parameter is for map_dataframe so it gets passed to sns.barplot:

g.map_dataframe(
        sns.barplot,
        x=groupby,
        y="normalised_selection_rate",
        hue=groupby,
        palette=color_dict,
        legend=False,
        seed=np.random.default_rng(0),
    )