Closed grantmcdermott closed 1 year ago
@zeileis All tests are passing so feel free to skip over this quickly if you don't have time to review. The main thing you may (or not) want to comment on are lines 583-605. In short, to get the individual group density plots I extract the model call and then run update
on each data subset. This seemed the most sensible way forward from my perspective (and appears to be working well enough). But let me know if you see potential problems, or have other suggestions.
Disclaimer: I'm not particularly fond of plotting the density f(x | by) against x. In most situations I'm more interested in the probability f(by | x). So this is what the "conditional density plot" cdplot()
(which has mostly been written by me) plots against x. And the trick is to compute f(by | x) = f(x | by) * f(by)/f(x). So I often feel that plotting f(x | by) against x is the wrong way around.
Concerns: Having said that, I see two potential problems. The first is easier to address than the second, I think.
cdplot()
. Not sure what other "ridge lines" implementations are doing. (Note: If you decide to use the same bandwidth you could call density()
directly rather than via update()
.update(object, ...)
method and your eval(str2lang(object$data), ...)
will only work if the data is in the parent frame (or visible from there). (Note: Your evaluation should probably also use envir = parent.frame()
for consistency.). This may not be true and unfortunately density()
does not store the enclosing environment (as it would be done in a formula
). That's why densities computed from auxiliary functions will not be compatible with plot2()
. See the example below. Thus, the tradeoff is between the simple convenience of plot2(density(x), by)
vs. a more elaborate formula interface like plot2(x ~ by, data = ..., type = "density")
or something along those lines.Example:
## convenience function returning a density
log_density <- function(x) density(log(x))
## example data
y <- 1:10
by <- rep(1:2, each = 5)
## set up density and try to plot
d <- log_density(y)
plot2(d, by = by) ## Error in log(x) : non-numeric argument to mathematical function
## or even worse: this would be completely misleading if you had eval(..., envir = parent.frame())
x <- by
plot2(d, by = x)
These are all great comments @zeileis. Some quick responses:
cdplot
. This strikes me as an obvious candidate for faceting once the internals of #12 are resolved.update
instead of density
to capture the other arguments like "kernel" and ...
, though right? That, or some manual implementation of what update
does internally.)density
itself (as you mention). Given your third example, I'm reluctant to go all in on the envir = parent.frame()
passing. I might just add a tryCatch error message in there. P.S. I'm not sure about the plo2t(x ~ by, type = "density")
syntax because it breaks the | by
API the we have elsewhere. But I might be persuaded by a one-side formula version, i.e. plot2(~ x | by, type = "density")
. Will think on this some more.
Thanks for the follow-up!
"density"
object but it isn't. So update()
is the best we can do. Of course, the kernel
argument might also come from somewhere where update()
cannot find it without the right environment.envir = parent.frame()
is the only consistent choice because from my reading this is what update()
does. Otherwise you might search for the x
argument in a different environment than for the other variables (bandwidth
, kernel
, etc.). I can try to cook up an example if you need more convincing ;-)type
argument there to choose between histogram and density.by
groups (e.g., by median) from highest to lowest, which would yield a kind of ridge lines plot.I'm convinced ;-)
I believe that my latest changes address the main shortcomings, so feel free to "Squash and merge" if you agree.
Quickly on 4 and 5, which I'll leave for a future set of PRs.
type
arguments is a cool a idea. We should look at supporting type = "hist"
too.EDIT: forgot to include an example of the updated grouped density plot (i.e., with the joint bandwidth calc).
library(plot2)
plot2(density(iris$Sepal.Width), by = iris$Species)
Created on 2023-04-13 with reprex v2.0.2
Looks good, thanks. I'll post a wishlist item for 4/5.
Fixes #17
Examples:
Created on 2023-04-11 with reprex v2.0.2