Open dkahle opened 3 years ago
I agree that implementing boundary_x
and boundary_y
mimicking boundary
from geom_histogram()
is a good idea -- there are definitely use cases for the end user wanting fine control over where the bin breaks occur. However, I do not think these arguments would yield a restricted support? At least, that does not agree with my understanding of the boundary
argument from geom_histogram()
which seems to parameterize the location of an arbitrary bin break.
Also, I do not see how they would be implemented for anything besides method = "histogram"
(and the forthcoming method = "freqpoly"
) as the other estimators do not perform any binning except for the discretization involved in the Riemann sum. In which case, the bins are so small that I have to think the location of the breaks is irrelevant.
It's possible that the xlim
and ylim
are close to what you have in mind already. Especially if we implemented some kind of expand = FALSE
argument. geom_hdr()
only uses data/draws within the rectangle defined by rangex
and rangey
. I didn't think very much on how I implemented them, and maybe with some changes they could provide a naive way to indicated bounded support. Also, in a way, they already parameterize what boundary_x
and boundary_y
would -- something to think about.
I haven't done a very in-depth search, but I have come across various methods for dealing with density estimation in the context of restricted supports. The R package bde implements several estimators, however it only deals with 1-dimensional data. In fact, I haven't come across anything that deals with bivariate density estimation with a restricted support (it's certainly possible I just haven't looked hard enough). I imagine we could extend some of the methods implemented in bde
(e.g. Müller, Chen), however I haven't read through/understood them yet. If this hasn't been done, I imagine it could be an interesting topic for another paper!
It'd be neat to have
boundary_x
andboundary_y
arguments that you could pass intogeom_hdr()
andgeom_hdr_lines()
whenmethod = "histogram"
, see my examples in the documentation to remember how it works forgeom_histogram()
. It'd be nice to have those work for all the methods, in fact. Have you come across any theory that addresses correcting density estimators for restricted support? The naive way would simply be to cut it to 0 and multiplicatively redistribute to the rest of the density (a la the truncated normal distribution), but I'd imagine others have thought about it more.