mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.57k stars 1.92k forks source link

How to do various things in the next gen Seaborn #2919

Closed aeturrell closed 2 years ago

aeturrell commented 2 years ago

Hello,

I'm sure this is all on the radar / plan for the next gen Seaborn anyway, but just thought I'd flag some features that I couldn't work out how to do today using the next gen syntax (most likely because these features haven't landed yet or aren't in the next gen documentation yet).

They are:

They're all from here: https://aeturrell.github.io/python4DS/communicate-plots.html, a training page on data vis that goes through how to do various things with next gen Seaborn and which has placeholders in for these features for now.

I am so impressed with the next gen version, can't wait to see more of it!

mwaskom commented 2 years ago

These are mostly not (yet) implemented. To answer specific questions:

  • title
  • subtitle

Not yet implemented.

  • caption

Can you say more about what you mean by this?

  • annotations

Not yet implemented. There will likely be a Text mark. I'm not sure what exactly a separate concept for "annotation" would look like (assuming you mean something like ax.annotate with the arrows and whatnot).

  • axis ticks

Controlled by calling the .tick method on a Scale subclass (e.g. Continuous) passed to Plot.scale.

  • legend layout

Not yet implemented (legends are super annoying).

  • saving plots to file

If you mean saving an image of the plot, this is Plot.save, it's a light wrapper for matplotlib's Figure.savefig. It returns the Plot so you can save multiple versions of the plot as you are building it up, which is maybe nice.

If you mean saving a serialized version of the Plot spec, this is not implemented but probably will be. In theory it should be possible to (mostly) roundtrip a Plot spec from/to yaml or a similar serialization format. I say "mostly" because some methods accept external objects (e.g., you can pass a matplotlib Locator object to Scale.tick), so the to part of this is a bit more complicated than specifying a (somewhat restricted) version of a Plot from yaml.

aeturrell commented 2 years ago

Thanks for the very swift response and the pointers.

Caption: smaller text that usually appears below charts and gives the source of the data or any other relevant contextual information. eg in the second example here for ggplot2. Or in a 'real-world' example here from the FT where the caption reads "Source: Indeed".

Annotations: yeah, I mean an equivalent of ax.annotate or, if that's not on the roadmap as a direct feature, an example of how to do it by falling back to matplotlib. (As an aside, an especially useful feature is to be able to annotate lines or points; I wrote a version of the lines one here but it's very hacky.)

Legends: I can believe it!

Saving: okay, thanks---and yes, was just wondering about how to do a vanilla save to file!

mwaskom commented 2 years ago

For captions, I wonder if it's worth pitching that upstream to matplotlib as a first-classs figure concept. It's possible to add arbitrary text to figures, of course, but I'm not sure it would always play well with the auto-layout algorithms.

aeturrell commented 2 years ago

Yeah that makes sense.

mwaskom commented 2 years ago

For titles (and subtitles / captions, although without true support for those in matplotlib it's trickier and i may wait to implement), I am thinking that folding them into the Plot.label method makes sense. It feels slightly wrong to me, but I can't think of a good argument to justify that preference.

One complication is that either:

I am curious what you would think about the following proposal for property customization. I think it might make sense to have a Label class with a signature like Label(text: str, **properties), where properties are things like font, size, alignment, etc. Then you'd have Plot.label(**labels: str | Callable | Label | None), so like

so.Plot(...)
.add(...)
.label(
    x="The x label",
    y=str.capitalize,
    alpha=None,
    color=so.Label("The color title", size=12, color="red"),
)

The alternatives would be:

Having a Label object feels best but maybe it's not obvious?

ps I think you could do Label(size=12) to style the default labels.

aeturrell commented 2 years ago

Yeah I completely see that the subtitle and caption features need to be built in at a fairly low level otherwise very hard to implement.

Having titles in Plot.label makes a lot of sense to me. It's "label" not "axis_label", so, as a naive user, I'd expect it to be in there. That's also how ggplot2 has gone, ie it uses

ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(color = class)) +
  geom_smooth(se = FALSE) +
  labs(title = "Fuel efficiency generally decreases with engine size", x="displ", y="hwy")

I think title as a potential keyword is acceptable—it's what I'd try first, anyway.

On property customisation, I can see the rationale for a signature like Label(text: str, **properties) but I think it could get a bit verbose / difficult to read with lots of settings being passed that way and (very much a personal opinion) I think I prefer the doing all customisation through rc parameters passed to Plot.theme option. I think this is pretty much the ggplot2 choice too (eg see here)—not that the ggplot2 choices are necessarily always the way to go but they're a good default if there's no overwhelming reason to go for something else.

mwaskom commented 2 years ago

It's "label" not "axis_label", so, as a naive user, I'd expect it to be in there.

I think the thing that bugs me about label(title=...) is that with x=, color=, etc., you're providing the label for that variable, but you're not providing the label for the title. But maybe that's too fussy.

I prefer the doing all customisation through rc parameters

While there will be a way to pass a dictionary of arbitrary rc parameters, I think that is not ideal as the only way to do aesthetic tweaking. The discoverability of rc paramters is pretty low ... the matplotlib documentation for them is not very good and they have periods, so cannot be used as keyword arguments that you could tab-complete out.

mwaskom commented 2 years ago

For titles, see #2934

aeturrell commented 2 years ago

Super exciting, thanks for flagging!