add interpretation of quantile/linear pool average

sbfnk commented 2 months ago

I tried to give a bit more context to Vincent vs. LOP/mixture averaging and how they can be interpreted, so as to motivate these two choices a bit more.

elray1 commented 2 months ago

I like the idea of adding text like this! I wonder if we can refine the description of the quantile averaging method a bit more. Here's the current proposal:

Statistically this amounts to a convolution of the probability distributions represented by the quantile levels, and the resulting distribution can be interpreted as the distribution of the mean of random variables that are represented by those distributions.

If we work with independent random variables $X_i$, i = 1, ..., n with corresponding cdfs $F_i$, it makes sense to me that the random variable $Y = \sum_i X_i$ has a distribution that is the convolution of the distributions $F_i$.

However (the following might be one point stated differently),

If we set $Z = \frac{1}{n} \sum_i X_i$, its distribution will be a scaled version of the convolution of the distributions $F_i$, not the convolution itself.
The quantiles of $Z$ are not the average of the quantiles of the $X_i$: $F_Z^{-1}(\theta) \neq \frac{1}{n} \sumi F{X_i}^{-1}(\theta)$
- intuitively, computing the mean of the independent random variables yields something with lower variance, but computing the quantile average does not
- as an example, suppose each $X_i \sim N(0, 1)$. Then, for example the 0.975 quantile of each $X_i$ is $1.96$ and the average of those is $1.96$, but the variance of $Z$ is $1/n$ and the 0.975 quantile of $Z$ is $1.96 [\frac{1}{n}]^{0.5}$.

eahowerton commented 2 months ago

I also think it's great to add this point in! I wonder if it'd be possible to convey the same point in a slightly less technical way (and perhaps in doing so, we avoid some of these concerns/confusions).

An intuitive illustration of the more general property that you are describing is that when combining a set of shape-scale distributions, the shape and scale parameters for the resulting Vincent ensemble is the mean of individual distributions shape and scale parameters (e.g., for some set of distributions $F_i \sim N(\mu_i, \sigma_i)$, then the simple Vincent average is $V \sim N(\frac{1}{n} \sum_i \mu_i, \frac{1}{n} \sum_i \sigma_i )$. Also maybe referencing Figure 1B (especially the inset) could help visualize the idea.

sbfnk commented 2 months ago

I like the idea of adding text like this! I wonder if we can refine the description of the quantile averaging method a bit more. Here's the current proposal:

Thanks - clearly what I wrote was wrong.

I also think it's great to add this point in! I wonder if it'd be possible to convey the same point in a slightly less technical way (and perhaps in doing so, we avoid some of these concerns/confusions).

That's a nice idea. Do you want me to have a go at rephrasing or just work something in yourself?

eahowerton commented 2 months ago

If you have suggestions, feel free to drop them here. Otherwise, happy to work it in on our side. Thanks!

sbfnk commented 2 months ago

Nothing beyond what's already written here so please do go ahead.

hubverse-org / hubEnsemblesManuscript

add interpretation of quantile/linear pool average #74