mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.51k stars 1.92k forks source link

Dodge applied in different order with / without previous Agg #3556

Open z626093820 opened 11 months ago

z626093820 commented 11 months ago

When I use the bar and scattered dotted drawings for drawing, the scattered dotted diagram cannot be displayed correctly in the correct position,how can I slove this? THANK YOU!

import seaborn.objects as so
import seaborn as sns
from seaborn import axes_style,plotting_context
so.Plot.config.theme.update(
    plotting_context('paper',font_scale=1.4)
    | axes_style("ticks",rc={'axes.spines.top':False, 'axes.spines.right':False})
)
penguins=sns.load_dataset('penguins')
(
    so.Plot(penguins, x="island", y='bill_length_mm', color="species")
    .layout(size=(3, 3))
    .add(so.Bar(), so.Agg(), so.Dodge())
    .add(so.Range(), so.Est(errorbar="sd"), so.Dodge())
    .add(so.Dot(pointsize=2), so.Jitter(0.2), so.Dodge())
)

image

mwaskom commented 11 months ago

This looks like a bug probably the same underlying issue as https://github.com/mwaskom/seaborn/issues/3015.

You can work around it by sorting your dataframe on the column you're using to group by before plotting.

P.S. I formatted the code in your OP so that it is easier to read.

mwaskom commented 11 months ago

You can also pass a Nominal scale with an explicit ordering:

(
    so.Plot(penguins, x="island",y='bill_length_mm', color="species")
    .add(so.Range(), so.Est(errorbar="sd"), so.Dodge())
    .add(so.Dot(pointsize=2), so.Jitter(0.2), so.Dodge())
    .add(so.Bar(), so.Agg(), so.Dodge())
    .scale(color=so.Nominal(order=penguins["species"].unique().tolist()))
)

So the question is ... why isn't the default ordering getting passed to the groupby operations?

z626093820 commented 11 months ago

You can also pass a Nominal scale with an explicit ordering:

(
    so.Plot(penguins, x="island",y='bill_length_mm', color="species")
    .add(so.Range(), so.Est(errorbar="sd"), so.Dodge())
    .add(so.Dot(pointsize=2), so.Jitter(0.2), so.Dodge())
    .add(so.Bar(), so.Agg(), so.Dodge())
    .scale(color=so.Nominal(order=penguins["species"].unique().tolist()))
)

So the question is ... why isn't the default ordering getting passed to the groupby operations?

The GPT answer is : This problem may appear between the sorting of the data set and the group operation. By default, the Groupby does not retain the order order of the original data. The method of solving this problem may include sorting data before GroupBy, or clearly specifying sorting methods in the GroupBY operation.

mwaskom commented 11 months ago

Thanks, yeah seaborn already does manage order internally (and turns off sorting) when using groupbys, and that's happening in the order-sensitive move like Dodge. However, the default ordering is not computed until after the stat is applied, and here the stat is changing the default order specifically because the crossing with the x variable is "sparse" (e.g., if you change x to "sex" you get the same order with/without Agg).

Currently, the stat and any moves are handled in different parts of the codebase (dating back to an original design where there was a stronger distinction between them). The plan is to refactor things such that stats and moves are more interchangeable. I think it would be much cleaner to use a consistent default ordering across all of the transforms after that happens. So for now, unfortunately, I think it's necessary to pass a Nominal scale with an explicit ordering (as demonstrated above) to avoid this behavior.