pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.53k stars 1.98k forks source link

Log-probability derivation for arbitrary order statistics (for i.i.d. [univariate] random variables) #7121

Open larryshamalama opened 6 months ago

larryshamalama commented 6 months ago

Description

Given an i.i.d. sample of univariate random variable $X_1, \dots, X_n$ with probability density function $f_X(x)$ and cumulative prob $FX(x)$, the jth order statistic is denoted by $X{(j)}$ and its probability density function is the following:

$$f{X{(j)}}(x) = \frac{n!}{(j - 1)!(n - 1)!} fX(x) { F{X}(x) }^{j - 1} {1 - F_{X}(x)}^{n - j} .$$

With the maximum and minimum statistics represented by $X{(1)}$ and $X{(n)}$, PyMC is capable of deriving their log-probability densities (#6790, #6846) and this issue directly extends that line of work for arbitrary $1 \leq k \leq n$.

Wikipedia reference: https://en.wikipedia.org/wiki/Order_statistic

CC @ricardoV94 @Dhruvanshu-Joshi

ricardoV94 commented 5 months ago

The challenge is to represent this with PyTensor. Max and min is easy, because there are Ops for it.

Then one could do sort(x)[idx], with idx == 0 or idx == -1 corresponding to max and min, but intermediate results would depend on the length of x. We need a pytensor.quantile anyway, and that would be a good candidate for how to represent orders in PyTensor: https://github.com/pymc-devs/pytensor/issues/53