tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.39k stars 2k forks source link

Should the diamonds dataset factor variables be unorderd? #5910

Closed davidhodge931 closed 1 month ago

davidhodge931 commented 1 month ago

The diamonds dataset is full of ordered variables

head(ggplot2::diamonds, 1)
#> # A tibble: 1 × 10
#>   carat cut   color clarity depth table price     x     y     z
#>   <dbl> <ord> <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1  0.23 Ideal E     SI2      61.5    55   326  3.95  3.98  2.43

Created on 2024-05-28 with reprex v2.1.0

R 4 datascience doesn't recommend using these. image https://r4ds.hadley.nz/factors

Subsequently, should the diamonds dataset factor variables be unorderd?

teunbrand commented 1 month ago

Is there a common case where the orderedness of these factors are giving some kind of issues?

thomasp85 commented 1 month ago

They are ordered values and should be treated as such. ggplot2 itself has different default scales for ordered vs unordered data.

I don't think you can take the recommendation in R4DS to such an extreme conclusion as to strip the use from all available data

davidhodge931 commented 1 month ago

Thanks!