tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.54k stars 2.03k forks source link

Changed stacking order #1748

Closed aphalo closed 8 years ago

aphalo commented 8 years ago

It seems that the code in 2.1.0.9000 (2016-09-14) does stacking of areas and bars putting the first level of the factor at the top, instead of the previous convention of putting it at the base. (Or am I missing something?)

Compare figure in page 52 of the new ggplot2 book with the output with current code in github:

ggplot(mpg, aes(class, fill = drv)) +
geom_bar()

The stacking order is reversed.

rplot-stacking

Reversing the order in the legend to match the stacking order would be nice, but reversing the stacking order to achieve this will cause too many undesirable effects with existing code, including packages that extend ggplot2.

thomasp85 commented 8 years ago

This is intentional as described in the news. It is only the default ordering, and it can be changed using factors. Reversing the legend is not an option as this would affect situations other than position = "stack"

aphalo commented 8 years ago

@thomasp85 Maybe intentional, but the same ordering of levels in a factor will lead to opposite stacking order depending on the version of ggplot2. This will change the output from all scripts and code written before version 2.1.0. For my package I will need to test the ggplot2 version and use different code depending on the version, or update my package concurrently with ggplot2 and require ggplot2 > 2.1.0. Almost all plotting functions in R use the convention of the scale starting at the bottom of the plot. Why go against this expectation? Reversing the stacking order could be optional, but changing the default like this will require many old scripts where the stacking order matters to be revised and will need different code for pre/post 2.1.0 versions of ggplot2. In my view this is too high a cost for a small improvement. Please, let me know if the decision is already firm. I have my package 'ggspectra' ready to submit to CRAN. If the decision is made I'll need to add code to make the ordering of levels of factors dependent on the version of ggplot2. I can get around the earlier changes by getting the data ordering and the order of levels in factors consistent, but I cannot think a way of achieving this without using conditional code to reverse the order of the levels.

thomasp85 commented 8 years ago

We'll have to get @hadley to weigh in on this...

thomasp85 commented 8 years ago

I'm unsure what you refer to with regards to

Almost all plotting functions in R use the convention of the scale starting at the bottom of the plot.

Can you provide some examples? I personally believes the new approach leads to more easily readable plots. Further, as I mentioned it is only the default that has changed - the stacking order could still be messed with by the user through factor manipulation in the old implementation, so if a certain order was needed there should already be checks and bounds in place in the relevant extensions...

aphalo commented 8 years ago

As I see it, it is not just the default that is being changed, but the interpretation of ordering of the factor levels. What I mean is that even when user code defines explicitly the ordering of levels in a factor, the stacking order is reversed. If the change affected only situations where user code does not set the order of the levels, I would not be so worried, but as far as I can see, there is no way to distinguish these two situations. I think that, in any case if an update is going to change the stacking order of all figures produced by existing code (bar, area, etc.), this cannot be a point update, and the change needs to be explained not in terms of a change in defaults, but as a change in the mapping of factor levels to stacking. I hope I am not misinterpreting the effect of the change made to the code...

I mean for a continuous scale in the y axis, zero is by default at the bottom. For a factor mapped to y aesthetic the first level is at the bottom of the plot rather than at the top.

I fully agree that keeping the order in the key and stacking the same is best. I regularly use breaks in the scale to reverse the order of the levels in the key to achieve this.

thomasp85 commented 8 years ago

Yes, you are right that it is more than a default change in terms of orientation. I was too quick to reply.

ggplot2 does not adhere to semantic versioning and a change as small as this (breaking or not) is not likely to warrant a bigger version bump. This feature/change will get emphasised in the release announcement due to its "breaking" nature...

Let's wait for @hadley to chime in

hadley commented 8 years ago

I think you need to provide some evidence that this is going to create major problems. What are some concrete examples of plots that look ok before the change and bad afterwards?

hadley commented 8 years ago

Making the next release ggplot2 3.0.0 certainly isn't out of the question

aphalo commented 8 years ago

Examples from my own field: staking bars for dry weight of different plant parts. In this case it is natural to have roots at the bottom, stem in the middle, leaves at the top. Another example, when plotting in a stacked area plot spectral reflectance, absorptance and transmittance for objects that normally receive light from above, it is natural to have the reflectance at the top, absorptance in the middles and transmittance at the bottom of the stack. For this reason, such plots are almost always plotted in a given way by tacit convention within a given field of research. These two examples come from the top of my head. I can imagine that other people may have other reasons to prefer/need a certain order for staking. Although having the same order for the key and the "slices" adds clarity, in many situations the interpretation of the plot is facilitated by the stacking order itself, specially when a certain order is expected by viewers. This is a tricky issue, and neither keeping the old behaviour nor the new one is really satisfactory. Could adding an option to revert to the old behaviour easy the transition? Then the old scripts/reports/book manuscripts could be updated with a single line of code. The option would be silently ignored by earlier versions of ggplot2 and the figures would remain the "same" independently of the version of ggplot2 used. I just feel that having to go through whole manuscripts checking where figures have changed and updating the code is too error prone. Recently I have been using knitr a lot for writing book manuscripts, reports and many of the overheads I use in teaching.

hadley commented 8 years ago

Concrete examples of actual plots would be most helpful.

aphalo commented 8 years ago

Here is one example. This one is after updating the code in 'ggspectra' to work as expected with 2.1.0 (not yet in CRAN). This is real data, and the reversed stacking may not that easy to spot. This is with 2.1.0.9000, which is wrong. test-ggplot.pdf This is with 2.1.0 with updated plot() method in 'ggspectra' wich is correct. test-ggplot-210.pdf This is with 2.1.0 before I updated the plot() method in 'ggspectra' which is wrong because order of levels is ignored. Pages from ecophys_04.pdf Before 2.1.0, the figure had been consistently correct for at least a year or two,

I will produce some other examples tomorrow. It is quite late over here.

aphalo commented 8 years ago

For stacking of area plots I can reproduce the change in behaviour. For stacked bars, I cannot get a correct plot with 2.1.0, 2.0.0, or 1.1.0, so apparently the example I had in mind has not worked for quite some time, so it is a non-issue. I can produce a correct plot only with 2.1.0.9000, so probably you are right, in that this change will not affect many people. test-ggplot-bar.pdf

It seems that stacking order in bar plots has not obeyed factor levels for quite some time... at least I could not get this to work even with version 1.1.0. Changing the order of the factor levels was only affecting the mapping of the scale values to the levels (e.g. colors) but not the stacking itself).

I only noticed this when struggling to get a stacked bar plot as I wanted some days ago.

I guess area plots are not that popular to make this change a big issue... but still may be an issue for some people.

aphalo commented 8 years ago

So, is this issue definitively closed? I have updated my package to generate exactly the same stacked area plot with versions 1.1.0, 2.0.0, 2.1.0, and 2.1.0.9000 of ggplot2. Unless I see activity in this issue, I will submit it to CRAN during the next weekend.

hadley commented 8 years ago

The current behaviour is unlikely to change unless there is widespread outcry during the release period (which seems unlikely)

aphalo commented 8 years ago

@hadley Thanks for the information. Yes, I agree that widespread outcry is unlikely. Clear documentation, as promised, should avoid user frustration. I was probably more upset than what was reasonable as I had been caught by surprise three times in a row within days, by bugs in lubridate, readr (also reported) and then the change in behaviour of ggplot2. @hadley I really use several of your packages on an almost daily basis as they are incredibly well designed, and appreciate all your work. I have been using ggplot2 from the early days, so maybe my expectations on its behaviour are stronger/more biased/more fixed than for other users.

kornl commented 8 years ago

I'm for this change, but just want to remark that this change breaks (i.e. results in wrong positioning) also many scripts out there where labels or numbers for the slices were added manually.