mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.4k stars 1.91k forks source link

seaborn.objects so.Plot() accept drawstyle argument #3412

Closed subsurfaceiodev closed 1 year ago

subsurfaceiodev commented 1 year ago

Currently there seems to be no way to do something like:

import pandas as pd
import seaborn.objects as so

dataset = pd.DataFrame(dict(
    x=[1, 2, 3, 4],
    y=[1, 2, 3, 4],
    group=['g1', 'g1', 'g2', 'g2'],
))

p = (
    so.Plot(dataset,
            x='x',
            y='y',
            drawstyle='group',
            )
    .add(so.Line())
    .scale(drawstyle=so.Nominal({'g1': 'default', 'g2': 'steps'}))
)

p.show()

We get: TypeError: Plot() got unexpected keyword argument(s): drawstyle

mwaskom commented 1 year ago

I do anticipate adding a Step mark but don't see a case for adding drawstyle as a mappable property of the Line mark. In general you seem to be expecting Line to expose the full API of plt.plot which is not a goal (if anything it's an anti-goal). But while your issues are clear in the sense that they have a nice reproducible example (thanks!) they don't really motivate why this would be helpful or solve a problem, so it's hard to say...

subsurfaceiodev commented 1 year ago

It is very common in geotechnical engineering (and oil and gas industry) to mix scatter data (as measured data) with line and/or scatter data (as correlated data) and, in some cases, modeled data or measured shear wave velocity data as step charts. See following image from https://www.marchetti-dmt.it/wp-content/uploads/bibliografia/totani_2009_SDMT_impenetrable_alessandria.pdf:

image

Also, faceting and color / marker grouping (which is already implemented by seaborn) is very useful for following figures. Notice the inverted y axis (as depth values), we believe this feature is not yet implemented in so.objects API (ax.invert_yaxis)

image

image

mwaskom commented 1 year ago

Thanks for sharing this example.

It is very common in geotechnical engineering (and oil and gas industry) to mix scatter data (as measured data) with line and/or scatter data (as correlated data) and, in some cases, modeled data or measured shear wave velocity data as step charts.

Based on this description I would say the best way to think about this plot is in terms of multiple layers, where each layer has a different kind of data that is represented by a different mark.

It is true that you could also arrive at what you're looking for by considering the different kinds of data to be subsets of one variable and using mapping. But if there are very different visual representations (a solid line for one and un-joined dots for the other) then it suggests different marks to me. Any I suspect that the most natural way to represent these data are with the measured and correlated variables as distinct columns in a dataframe, rather than the long-form organization you'd need use mapping.

In terms of feature set then the main thing that the objects interface is missing to allow this plot is the concept of labeling a layer in the legend: #3046. I hope to add that for v0.13.0. And I hope to add a Step mark too. (Although you can also enable stepping through the artist_kws of the Line or Path marks ). But I don't see a strong case for making the drawstyle itself mappable.