Open mmore500 opened 9 months ago
In case anyone else also needs it, I've put the patch I'm using in the meantime up at https://github.com/mmore500/outset/blob/1dd0d036f90bc7b27f20c5fca3f6eb257e70770c/src/outset/patched/_scatterplot.py
This is not an obvious one. In the case of lines (for instance), the dataframe is grouped by subset of color/linestyle/... combination, and for each subset a line is drawn (and set to the correct aspect) ; it is not the case with scatter, where all the points are plotted at the same time, and then the colors and styles are mapped to each point. This probably requires rewriting the function for scatter plots in the same fashion as for line plots.
The reason that scatterplot
works differently from lineplot
and others is that if we grouped over the hue/size/style variables the resulting scatterplot would be "layered" in a way that could be misleading. Additionally, the number of individual collection artists generated may be very large (imagine a dense scatterplot with a continuous hue variable where most hue observations are distinct).
It also looks like a similar issue occurs in stripplot
(although stripplot dots do not have edges by default):
sns.stripplot(diamonds, x="cut", y="price", hue="clarity", linewidth=.3, hue_order=["I1", "IF"])
Probably the solution is to reduce the dataframe to just the rows with the relevant values for hue/size/style at some point, either centrally, or in the plotting method for scatter-type plots. Given that it occurs in more than one place, centrally makes sense, but there may be some complications relating to choosing default scale domains and/or computing statistics that we'd need to be mindful of.
Note that the objects interface does not have this issue:
import seaborn.objects as so
(
so.Plot(diamonds, x="carat", y="price", color="clarity")
.add(so.Dots())
.scale(color=so.Nominal(order=["IF", "I1"]))
)
bug description: scatterplot plots hollow points when
hue_order
is a strict subset ofhue
values within dataframe and crashes whenstyle_order
is a strict subset ofstyle
values with in dataframe (i.e., the dataframe containshue
orstyle
values not present inhue_order
/style_order
). Was able to reproduce in sns versions 0.13.0, 0.12.0, and 0.11.0expected behavior: scatterplot would should plot the subset of data with values specified in
hue_order
andstyle_order
, like current behavior oflineplot
,kdeplot
, etc.related issues: none obvious, #3575 has different stack trace and does not occur with 0.12.x versions of seaborn
if wanted, I'd be happy to look into contributing a fix
scatterplot
: plots hollow points whenset(hue_order) < set(df[hue])
scatterplot
: crashes whenset(style_order) < set(df[style])
for comparison, lineplot, kdeplot, and lmplot have a more expected behavior
lineplot
: works as expected withset(style_order) < set(df[style])
andset(hue_order) < set(df[hue])
kdeplot
: works as expected whenset(hue_order) < set(df[hue])
lmplot
: works as expected whenset(hue_order) < set(df[hue])
system information
seaborn v0.13.0, I was also able to reproduce on v0.12.0 and v0.11.0