wilkelab / ggridges

Ridgeline plots in ggplot2
https://wilkelab.org/ggridges
GNU General Public License v2.0
411 stars 31 forks source link

Weighted densities #5

Closed bgall closed 8 months ago

bgall commented 7 years ago

It appears that the geom_density_ridges() geom cannot take weights in the calculation of the densities. However, it is quite common that we are interested in weighting each observation in the density function.

Since weights are built into the default density function in ggplot, it should be feasible to incorporate weights into the ggridges density function.

Looking at the density function in ggridges, it looks like one would need to add in the weights argument to the aesthetic mappings (this is done in the standard ggplot density function), perhaps setting the default value to 1, then use that argument when the density is calculated in ggridges. In other words, convert the below:

d <- density(data$x, bw = bandwidth[panel_id], from = from[panel_id], to = to[panel_id], na.rm = TRUE)

into

d <- density(data$x, bw = bandwidth[panel_id], from = from[panel_id], to = to[panel_id], na.rm = TRUE, weight = weights)

bgall commented 6 years ago

Incorporating all of the arguments of stat_density into the density function in geom_density_ridges() should be straightforward since currently one can specify an alternative density function.

If one wanted to complicate things somewhat, this could simply be a check for weights, passing of the weights aesthetic to an alternative density calculation function (i.e. status_density()) then scaling the x-axis to weighted density rather than the observed density.

Currently can provide a workaround by calculating the density without the default geom_density_ridges density calculator:

geom_density_ridges(aes(height=..density..,
                          weight=pweight),    
                      scale= 0.95,
                      stat="density") 
modche commented 6 years ago

I do not fully understand the workaround @bgall suggested here. Does this actually mean that other density calculation are possible right now? What is pweight in this example? Actually I would be happy to have a solution for better density calalculation near the 0 (without negative values), like here... Would be a really nice feature. Btw: I like the package, nice work!

bgall commented 6 years ago

The above solution is for cases where you have some observed data but you want to weight each observation in the calculation of the density using a vector of weights. For example, you might have survey data where each respondent is assigned a weight representing the probability of selection into the sample (a sampling weight) or a weight representing the number of cases each observation is supposed to represent (a frequency weight). Depending upon what you want, the above implementation may suit your needs. The above calculates the frequency-weighted density by overriding the default density function in ggridges and instead using the density arguments native to ggplot (which underlies the ggridges syntax). You should be able to define an arbitrary density function, subject to ggplot's constraints.

Performance near zero is due to the nature of the non-parametric problem. You could, for example, take the density of the logarithm of your observed values. Of course, this changes the quantity you're plotting!

modche commented 4 years ago

I have seen that geom_boxplot offers a weighting functionality. Is it possible to use a comparable approach in geom_density_ridges? I try to use the workaround above but this doesnt work if I want to add quantile_lines=TRUE to the ridges.

mkoohafkan commented 4 years ago

Looks like this issue has been around for while... it would be great if geom_density_ridges could support weight as an aesthetic in the same way that ggplot2::geom_density does. Frankly I'm surprised that the density calculation of geom_density_ridges doesn't exactly match ggplot2::geom_density.

If implementing the weight aesthetic is not possible, it would be great to at least incorporate the workaround of @bgall into the documentation.

clauswilke commented 4 years ago

I have never considered this high priority, since it's possible to use the regular stat_density(), but I'd be happy to review a pull request that implements this feature.

mkoohafkan commented 4 years ago

@clauswilke can you clarify how to use the regular stat-density() in this case?

clauswilke commented 4 years ago

Weighted densities are possible as described here: https://github.com/wilkelab/ggridges/issues/5#issuecomment-338444835. However, then you're using stat_density(), which doesn't support quantile lines or jittered points.

mkoohafkan commented 4 years ago

@clauswilke thanks for the clarification, I thought you were referring to a geom_ridgeline() + stat_density() combo.

matanhakim commented 1 year ago

For what it's worth, I too would be really happy to see this implemented - I have a use case for a weighted ridgeline plot with quantile (well, mean, to be precise) lines.

clauswilke commented 1 year ago

There was a PR (#59) that started tackling this but then got abandoned. If somebody wants to pick this up and finalize I'm happy to review and integrate into the code base.

clauswilke commented 8 months ago

Addressed by #90.