plotly / plotly.py

The interactive graphing library for Python :sparkles: This project now includes Plotly Express!
https://plotly.com/python/
MIT License
16.18k stars 2.55k forks source link

allow points to be shown inside violin area #4008

Open Midnighter opened 1 year ago

Midnighter commented 1 year ago

One of my favorite types of plots to show distributions is to use Sina plots where the points are spread out within the area of the violin representing the density of the distribution. I then additionally overlay this with a boxplot as is already possible in plotly. So my questions/feature requests are then:

  1. Can the points of a violin plot be shown inside of the violin rather than next to it?
  2. Is it possible to have the points be spread out to the width of the violin as in the Sina plots?

Thank you for your consideration.

Alexboiboi commented 1 year ago

Hi @Midnighter, As far as I know there is no direct way of doing this but you could do this in a two step way. It may not be very pretty though...

import plotly.express as px

df = px.data.tips()
pts = px.strip(df, y="total_bill").data
fig = px.violin(df, y="total_bill", points=False)
fig.add_traces(pts)
fig.show()
image
Midnighter commented 1 year ago

Thanks for the answer. For space reasons this is preferable to me but I agree that it would look better with more specific placement of the points.

alexcjohnson commented 1 year ago

You can do this with a single trace, using pointpos:

import plotly.express as px

df = px.data.tips()
fig = px.violin(df, y="total_bill", points="all")
fig.update_traces(pointpos=0)
fig.show()

Gives an identical result to what @Alexboiboi showed. But still you're right, the jitter algorithm was made for box plots rather than violins. If anyone is interested to make a PR to plotly.js it should be relatively easy to add an option to use the KDE as the jitter envelope, to achieve the effect you're looking for.

Midnighter commented 1 year ago

@alexcjohnson thank you for your addition. I've never looked at the plotly.js source code so far. Could you provide me with a link to where you think this new feature would need to be inserted, please?

alexcjohnson commented 1 year ago

The existing jitter algorithm is in traces/box/plot.js - even though it's in the box trace, it also gets used by violins here.

I think if you're in there from a violin trace you should have access to the density array. I'd suggest not supporting this for box traces, for now anyway, as that would require a separate calculation of the KDE.

We'll need a new violin attribute, maybe call it jittermode: 'box'|'kde'. Violins also reuse box point attribute defaults, which sets jitter and pointpos here - to make this easiest to use we can modify that so if using the new 'kde' mode the defaults are pointpos: 0, jitter: 1.

nicolaskruchten commented 1 year ago

Huge fan of this idea! Note that at the moment the box logic does some approximation of this, by broadening the jitter width where there are a lot points, no? It would be nice to have this a bit better lined up with the violin trace. There's also the "beeswarm" type of jittering which is not random but geometry-aware: points are laid out so as to form a compact group without overlapping, but this would need to happen lower down in the pipeline I think.