tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.39k stars 2k forks source link

stat_density() error and bw.nrd() man file are in conflict on valid inputs #5941

Open mfoos opened 3 weeks ago

mfoos commented 3 weeks ago

Hello! This is relatively minor (in that it's easy for a novice to fix, though it's breaking to old code) but I think I am correct that it's in error. I was rerunning some old code that uses stat_density(..., bw = "SJ") and I got the error

Warning: Computation failed in stat_density(). Caused by error in precompute_bw(): ! bw must be one of "nrd0", "nrd", "ucv", "bcv", "sj", "sj-ste", or "sj-dpi", not "SJ". ℹ Did you mean "sj"?

And so I read the stat_density man page, which says bw should be set to:

The smoothing bandwidth to be used. If numeric, the standard deviation of the smoothing kernel. If character, a rule to choose the bandwidth, as listed in stats::bw.nrd(). Note that automatic calculation of the bandwidth does not take weights into account.

When you click into the stats manpage (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/bandwidth.html) it shows several SJ options that one can use (bw.SJ, SJ-ste, SJ-dpi). But those are different from the ones in the ggplot2 error message, and indeed when you try them, they fail due to ggplot input checking (reprex below).

I'm not a bleeding-edge updater, but this is happening with R4.3.2 and ggplot2 3.5.1. I'm not in a position to update today, but the stats manpage linked above shows that in R4.5.0 their documentation has not changed. I also did not find any mention of this in the ggplot2 changelog.

library(ggplot2)

# Equivalent to legacy code that brought this to my attention
ggplot() +
  stat_density(data = diamonds, aes(x = x, fill = cut, alpha = .01), bw = "SJ")
#> Warning: Computation failed in `stat_density()`.
#> Caused by error in `precompute_bw()`:
#> ! `bw` must be one of "nrd0", "nrd", "ucv", "bcv", "sj", "sj-ste", or
#>   "sj-dpi", not "SJ".
#> ℹ Did you mean "sj"?


# Indicates "SJ" is still a valid bw value outside of stat_density/ggplot2
density(diamonds$x, bw = "SJ")
#> 
#> Call:
#>  density.default(x = diamonds$x, bw = "SJ")
#> 
#> Data: diamonds$x (53940 obs.);   Bandwidth 'bw' = 0.02009
#> 
#>        x                  y           
#>  Min.   :-0.06026   Min.   :0.000000  
#>  1st Qu.: 2.65487   1st Qu.:0.000000  
#>  Median : 5.37000   Median :0.001519  
#>  Mean   : 5.37000   Mean   :0.091983  
#>  3rd Qu.: 8.08513   3rd Qu.:0.141838  
#>  Max.   :10.80026   Max.   :0.763688

# Suggested by bandwidth manpage, but does not work
ggplot() +
  stat_density(data = diamonds, aes(x = x, fill = cut, alpha = .01), bw = "SJ-ste")
#> Warning: Computation failed in `stat_density()`.
#> Caused by error in `precompute_bw()`:
#> ! `bw` must be one of "nrd0", "nrd", "ucv", "bcv", "sj", "sj-ste", or
#>   "sj-dpi", not "SJ-ste".
#> ℹ Did you mean "sj-ste"?


# Suggested by bandwidth manpage, but does not work
ggplot() +
  stat_density(data = diamonds, aes(x = x, fill = cut, alpha = .01), bw = "bw.SJ")
#> Warning: Computation failed in `stat_density()`.
#> Caused by error in `precompute_bw()`:
#> ! `bw` must be one of "nrd0", "nrd", "ucv", "bcv", "sj", "sj-ste", or
#>   "sj-dpi", not "bw.SJ".


# ggplot error message answer does indeed work
ggplot() +
  stat_density(data = diamonds, aes(x = x, fill = cut, alpha = .01), bw = "sj-ste")

Created on 2024-06-12 with reprex v2.1.0

teunbrand commented 3 weeks ago

The error message gives the valid options. Is the ask here to also allow uppercase, or is the ask here to change the documentation to list the options in full?

mfoos commented 3 weeks ago

Since the implementation still uses the bw.nrd (etc) code under the hood, and the desire is probably to continue to use that existing/precise/citation'd manpage for reference, I think allowing the uppercase options would be the more unifying solution.

teunbrand commented 3 days ago

The solution would be to use to_lower_ascii() before using arg_match0() in the piece of code below:

https://github.com/tidyverse/ggplot2/blob/2610840f7478af027294db0793df839199b8cb6b/R/stat-density.R#L245-L247