tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.52k stars 2.03k forks source link

The future of vars() #4374

Open thomasp85 opened 3 years ago

thomasp85 commented 3 years ago

vars() have been deprecated in the rest of tidyverse and is now in an odd position in ggplot2. Due to the non-standard semantics of facet specifications there are no real way to avoid some type of collector function with the same semantics as vars() so the current plan is to let it exist purely as a ggplot2 thing, but switch back to promoting formula notation in documentation examples etc... vars() will then only be promoted as a utility for programming with ggplot2 and not for interactive use.

Feedback from the maintainer team on this is very welcome - I've never really used vars() myself so my attachment to it is very limited

clauswilke commented 3 years ago

Would it be possible to take a tidy-select argument, similar to say nest() or unnest()? I don't like the formula interface because it feels out of place in ggplot2, but I don't particularly like vars() either because it appears nowhere else in the tidyverse.

thomasp85 commented 3 years ago

We (me and @hadley) discussed that. "Unfortunately" we allow computations inside the facet specs and this doesn't fit into tidy select semantics...

We have honestly painted ourselves into a bit of a corner with the facet mini-DSL we have and I can't see any way we can bridge it without severely destroying backwards compatibility

clauswilke commented 3 years ago

Then my preference is for vars() over ~.

hadley commented 3 years ago

@clauswilke the problem with vars is that it's very verbose compared to ~facet_grid(vars(x), vars(y)) vs facet_grid(x ~ y). So my thought is to allow either a double-sided formula in row (as we do currently for backward compatibility) and continue to allow vars in rows and cols. This isn't a change in behaviour so much just changing what we recommend in the documentation.

clauswilke commented 3 years ago

@hadley Yeah, I get it, and I don't have a strong opinion one way or another. Maybe I'm just grumpy because I just changed all my teaching materials over to vars(). :-)

I do think though that something isn't quite right with the formula interface. Consider the following code:

ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point() +
  facet_wrap(~Species)

Just looking at it, the ~ looks out of place, and I find myself constantly forgetting it. I always want to type facet_wrap(Species) and then I get an error.

Since I've trained myself to type facet_wrap(vars(Species)) that's been less of an issue. It reminds me that I have to explicitly wrap the data columns I'm interested in, just as in aes(...).

One way out could be to at some point add an aesthetic mapping interface to facets, e.g.:

ggplot(iris, aes(Sepal.Length, Sepal.Width, facet = Species)) +
  geom_point() +
  facet_wrap()

Altair does something like this: https://altair-viz.github.io/user_guide/encoding.html#encoding-channels Not sure what that would involve in terms of changes internally.

yutannihilation commented 3 years ago

Personally, I always use vars() in facet_grid() with the argument names (rows/cols) because I can never succeed to remember which side of the formula is rows or cols. I expect ordinary people are smarter than I, though.

baptiste commented 3 years ago

One way out could be to at some point add an aesthetic mapping interface to facets, e.g.:

That's what Julia's Gadfly does; it bothered me at first, but after a while I came to agree that facetting isn't too different from a (meta)position mapping – in fact when I see things like https://hafen.github.io/geofacet/ it becomes even more convincing.

bwiernik commented 3 years ago

One way out could be to at some point add an aesthetic mapping interface to facets, e.g.:

I personally would really love for facet, row, and column to be parameters specified in aes() rather than any additional interface unique to facet_*().

hadley commented 3 years ago

Just to be clear, moving facet specification into aesthetics is definitely out of scope for this discussion.

arencambre commented 3 years ago

moving facet specification into aesthetics is definitely out of scope for this discussion

Is that to keep this discussion focused or because it's not considered a good idea? Is there a separate discussion for alternatives to vars()?

vars() have been deprecated in the rest of tidyverse

Can you share a link to that discussion? I am curious of the rationale. That may help inform new ideas. I've done some searching and came across https://github.com/tidyverse/dplyr/issues/4432 from 2019, but not sure if that is a example of why it's deprecated.