Implement quantile for `Mixture` and `MixtureSameFamily`.

moonman925 commented 4 years ago

when will the inverse cdf of these kinda distribution quantile be implemented? I have nightly version installed but found quantile not implemented.

srvasude commented 4 years ago

Hi, For mixture distributions, there generally isn't a closed form for quantiles of mixtures. You could use something like Newton's method with the CDF function to have an approximation to a quantile function. Given a mixture of distributions with known CDFs, the mixture CDF is a weighted sum of the CDF of the component distributions (which I believe the mixture distributions already implement).

Feel free to reopen this if you want to discuss more.

brianwa84 commented 4 years ago

Actually MixtureSameFamily doesn't implement quantile. We could do that the way you suggest. PRs welcome.

srvasude commented 4 years ago

@brianwa84 Yep that's what I was mentioning :P. Will change title of this issue to reflect that.

brianwa84 commented 4 years ago

Oh never mind, was thinking backward. Cdf you can complete with weighted sum, and is already implemented. Not quantile.

Demetrio92 commented 4 years ago

For the fellow strugglers -- just use pynverse. I have a model which predicts a mixture of gaussians with 4 components. Here is how I get quantiles for one observation.

from pynverse import inversefunc
from tensorflow_probability import distributions as tfd
mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=alpha_pred[0, :]),
        components_distribution=tfd.Normal(
            loc=mu_pred[0, :],
            scale=sigma_pred[0, :]))
cdf_fit = lambda x: mixture_distribution.cdf(x).numpy()
%timeit quantile_func_fit = inversefunc(cdf_fit, domain=[y_train.min(),y_train.max()], accuracy=6)

7.79 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

quantile_func_fit([0.5, 0.9, 0.95, 0.99])
>>> array([15.26007136, 37.84270684, 39.90694571, 42.79034905])

cdf_fit([15.26007136, 37.84270684, 39.90694571, 42.79034905])
>>> array([0.5       , 0.90000004, 0.95      , 0.99      ], dtype=float32)

Good enough for any practical application.

This is one-core computation, using bare-bones numpy. I'm sure a proper tensorflow implementation would be even faster. @brianwa84 why is this issue closed?..

Demetrio92 commented 4 years ago

In the meantime I hacked together a rather efficient quantile approximation. USE WITH CAUTION! READ THE CODE

import numpy as np

def mixture_distribution_quantiles(mixture_distribution, probs, N_grid_points=int(1e3)):
    # base_grid = get_base_grid(mixture_distribution)
    # TODO: need a smarter grid (or you can customize it to your dataset)
    base_grid = np.linspace(-5, 20, num=N_grid_points)  # hacked in order to fit the simulation below
    full_grid = np.transpose(np.tile(base_grid, (mixture_distribution.batch_shape[0], 1)))
    cdf_grid = mixture_distribution.cdf(full_grid).numpy()  # this is fully parallelized and even uses GPU
    grid_check = (cdf_grid.min(axis=0).max() <= min(probs)) & (max(probs) <= cdf_grid.max(axis=0).min())
    if not grid_check:
        raise RuntimeError('Grid does not span full CDF range needed for interpolation!')

    probs_row_grid = np.transpose(np.tile(probs, (cdf_grid.shape[0], 1)))
    def get_quantiles_for_one_observation(cdf_grid_one_obs):
        return base_grid[np.argmax(np.greater(cdf_grid_one_obs, probs_row_grid), axis=1)]

    # TODO: this is the main performance bottleneck. uses only one CPU core
    quantiles_grid = np.apply_along_axis(
        func1d=get_quantiles_for_one_observation,
        axis=0,
        arr=cdf_grid,
    )
    return quantiles_grid

if __name__ == "__main__":
    from tensorflow_probability import distributions as tfd
    quantiles_to_compute = np.array([0.005, 0.1, 0.25, 0.5, 0.75, 0.9, 0.995])
    N_predictions = int(4*1e3)
    N_components = 5
    N_grid_points = int(1e4)
    pi_sim = np.random.uniform(size=(N_predictions, N_components)).astype(np.float32)
    pi_sim = pi_sim/np.transpose(np.tile(pi_sim.sum(axis=1), (N_components, 1)))
    mean_sim = np.random.uniform(low=5, high=10, size=(N_predictions, N_components)).astype(np.float32)
    sd_sim = np.random.uniform(low=0.1, high=1, size=(N_predictions, N_components)).astype(np.float32)
    mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=pi_sim),
        components_distribution=tfd.Normal(
            loc=mean_sim,
            scale=sd_sim,
            validate_args=True,
            allow_nan_stats=False,
        )
    )

    quantiles_for_each_prediction = mixture_distribution_quantiles(
        mixture_distribution=mixture_distribution,
        probs=quantiles_to_compute,
        N_grid_points=N_grid_points,
    )
    print(quantiles_for_each_prediction.shape)

    # test
    one_obs = 100
    quantiles_one_obs = quantiles_for_each_prediction[:, one_obs]
    probs_validation = [round(mixture_distribution.cdf(quant_val).numpy()[one_obs], 3) for quant_val in quantiles_one_obs]
    print(f'''
        probs  input: {quantiles_to_compute}
        probs approx: {probs_validation}
    ''')

Result:

        probs  input: [0.005 0.1   0.25  0.5   0.75  0.9   0.995]
        probs approx: [0.005, 0.1, 0.251, 0.501, 0.751, 0.9, 0.995]

good enough!

Performance

Setup: 16 Xenon cores, lot's of RAM, no GPU. This simulation eats roughly 10GB (peak).

%%timeit
  ...: quantiles_for_each_prediction = mixture_distribution_quantiles(
  ...:     mixture_distribution=mixture_distribution,
  ...:     probs=quantiles_to_compute,
  ...:     N_grid_points=N_grid_points,
  ...: )
  ...: 
11.7 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Demetrio92 commented 4 years ago

@brianwa84 please re-open the issue!

axch commented 4 years ago

Reopening on request.

gitlabspy commented 4 years ago

In the meantime I hacked together a rather efficient quantile approximation. USE WITH CAUTION! READ THE CODE

import numpy as np

def mixture_distribution_quantiles(mixture_distribution, probs, N_grid_points=int(1e3)):
    # base_grid = get_base_grid(mixture_distribution)
    # TODO: need a smarter grid (or you can customize it to your dataset)
    base_grid = np.linspace(-5, 20, num=N_grid_points)  # hacked in order to fit the simulation below
    full_grid = np.transpose(np.tile(base_grid, (mixture_distribution.batch_shape[0], 1)))
    cdf_grid = mixture_distribution.cdf(full_grid).numpy()  # this is fully parallelized and even uses GPU
    grid_check = (cdf_grid.min(axis=0).max() <= min(probs)) & (max(probs) <= cdf_grid.max(axis=0).min())
    if not grid_check:
        raise RuntimeError('Grid does not span full CDF range needed for interpolation!')

    probs_row_grid = np.transpose(np.tile(probs, (cdf_grid.shape[0], 1)))
    def get_quantiles_for_one_observation(cdf_grid_one_obs):
        return base_grid[np.argmax(np.greater(cdf_grid_one_obs, probs_row_grid), axis=1)]

    # TODO: this is the main performance bottleneck. uses only one CPU core
    quantiles_grid = np.apply_along_axis(
        func1d=get_quantiles_for_one_observation,
        axis=0,
        arr=cdf_grid,
    )
    return quantiles_grid

if __name__ == "__main__":
    from tensorflow_probability import distributions as tfd
    quantiles_to_compute = np.array([0.005, 0.1, 0.25, 0.5, 0.75, 0.9, 0.995])
    N_predictions = int(4*1e3)
    N_components = 5
    N_grid_points = int(1e4)
    pi_sim = np.random.uniform(size=(N_predictions, N_components)).astype(np.float32)
    pi_sim = pi_sim/np.transpose(np.tile(pi_sim.sum(axis=1), (N_components, 1)))
    mean_sim = np.random.uniform(low=5, high=10, size=(N_predictions, N_components)).astype(np.float32)
    sd_sim = np.random.uniform(low=0.1, high=1, size=(N_predictions, N_components)).astype(np.float32)
    mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=pi_sim),
        components_distribution=tfd.Normal(
            loc=mean_sim,
            scale=sd_sim,
            validate_args=True,
            allow_nan_stats=False,
        )
    )

    quantiles_for_each_prediction = mixture_distribution_quantiles(
        mixture_distribution=mixture_distribution,
        probs=quantiles_to_compute,
        N_grid_points=N_grid_points,
    )
    print(quantiles_for_each_prediction.shape)

    # test
    one_obs = 100
    quantiles_one_obs = quantiles_for_each_prediction[:, one_obs]
    probs_validation = [round(mixture_distribution.cdf(quant_val).numpy()[one_obs], 3) for quant_val in quantiles_one_obs]
    print(f'''
        probs  input: {quantiles_to_compute}
        probs approx: {probs_validation}
    ''')

Result:

        probs  input: [0.005 0.1   0.25  0.5   0.75  0.9   0.995]
        probs approx: [0.005, 0.1, 0.251, 0.501, 0.751, 0.9, 0.995]

good enough!

Performance

Setup: 16 Xenon cores, lot's of RAM, no GPU. This simulation eats roughly 10GB (peak).

%%timeit
  ...: quantiles_for_each_prediction = mixture_distribution_quantiles(
  ...:     mixture_distribution=mixture_distribution,
  ...:     probs=quantiles_to_compute,
  ...:     N_grid_points=N_grid_points,
  ...: )
  ...: 
11.7 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I am using it to reverse mixture logistic

mixture_distribution = tfd.MixtureSameFamily(
mixture_distribution=tfd.Categorical(probs=pi_sim),
components_distribution=tfd.Logistic(
loc=mean_sim,
scale=sd_sim,
)
)

it seems that observations in quantiles_for_each_prediction are very different. The average mean of the error ( probs_validation - quantiles_to_compute)is bigger than 1e-5. Any suggestions to optimize?

Demetrio92 commented 4 years ago

I am using it to reverse mixture logistic
mixture_distribution = tfd.MixtureSameFamily(
        mixture_distribution=tfd.Categorical(probs=pi_sim),
        components_distribution=tfd.Logistic(
            loc=mean_sim,
            scale=sd_sim,
        )
    )
it seems that observations in quantiles_for_each_prediction are very different. The average mean of the error ( probs_validation - quantiles_to_compute)is bigger than 1e-5. Any suggestions to optimize?

Increase N_grid_points until sufficient precision reached. If you run out of memory, you can just batch the grid.

If performance is not an issue, use pynverse as in my other example above.

Also: logistic distribution is similar to normal, so the example above works, but generally you want to have an appropriate grid for the linear interpolation. E.g. it should cover the full domain of your mixture, but at the same time not be too broad, so you have enough points. pynverse solves this problem in a very general fashion, however it's not vectorised, so you will quickly end up having scalability issues.

beat-247 commented 3 years ago

Are there any plans to implement an approximation to the quantile function of MixtureSameFamily, perhaps using information from the components_distribution for efficiency?

Strateus commented 1 year ago

any updates on this? Was surprised to find out that even Student does not have quantile in tfp, while scipy does

ColCarroll commented 1 year ago

PRs are welcome, but we're not pursuing a quantile function right now for the Student T. We could use a solver along with the CDF function as was discussed, but that might have surprising runtime characteristics.

rmmaf commented 10 months ago

Any plans to implement quantile?

tensorflow / probability

Implement quantile for `Mixture` and `MixtureSameFamily`. #659

Performance

Performance