Closed acampove closed 3 years ago
Hi @acampove,
If you are using SciPy >= 1.7.0, you can use the new fast numerical inversion method NumericalInverseHermite
to draw samples from your custom distribution.
It approximates the PPF using cubic Hermite splines and provides a very fast sampling method. All you need to do is implement the CDF and optionally the PDF method.
import numpy as np
from scipy.stats import NumericalInverseHermite, rv_continuous
from scipy import stats
class MyDistribution(rv_continuous):
def _pdf(self, x, shape_parameters):
# implement the pdf here
return pdf
def _cdf(self, x, shape_parameters):
# implement the cdf here
return cdf
distribution = MyDistribution(name="distribution")
dist = distribution(shape_parameters) # freeze the distribution
# calculate the PPF
fni = NumericalInverseHermite(dist)
# sample from the distribution
rvs = fni.rvs(100000)
# we can also calculate the approximated PPF
ppf = fni.ppf(np.linspace(0.01, 0.99, num=100000))
# you can also sample quasi monte carlo samples from the given distribution
qrvs = fni.qrvs(100_000, qmc_engine=stats.qmc.Halton(2))
I'm looking forward to using NumericalInverseHermite
.
To elaborate on your original post:
To generate random numbers from an rv_continuous
you should at least be defining _pdf
and _cdf
, you didn't define _pdf
above.
In order to generate random variables from any rv_continuous distribution a random number is generated in the interval [0, 1)
. This is then given to rv_continuous.ppf
(inverse of _cdf
) to obtain the random variate. You don't need to override the _ppf
method yourself. However, if you don't override it then the inverse CDF has to be calculated in rv_continuous
using a numerical root finding algorithm, which can take a long time. This is just the way of the world, there's no way around it. Or you can use @tirthasheshpatel approach.
NumericalInverseHermite
currently needs a ppf
method and isf
method. Each is evaluated at exactly one point, just to find practical boundaries of the support. To get around this, you may be interested in:
https://github.com/scipy/scipy/issues/14568#issuecomment-906602863
The alternative seems to either invert in an approximate way, which brings numerical precision problems or do the integral numerically, which will be terribly inneficient and slow.
Yes, those are your main options if you don't define a _ppf
or rvs
method yourself. The approximate approach is what NumericslInverseHermite
does (but this can often be accurate almost to machine precision), and the slow, numerical inversion approach is what rv_continuous
does by default. If you just need random variates, you can also look at rvs_ratio_uniforms
.
@tirthasheshpatel has been working on a series of PRs that will add several more options here in 1.8.0.
Is your feature request related to a problem? Please describe.
When implementing a custom PDF in order to draw random samples like in:
I can calculate myself the integral and implement
_cdf
. However the generation of samples is extremely slow. Apparently I need_ppf
also implemented, which I cannot do easily given that the integral is not invertible:Describe the solution you'd like.
The integral should be all that is needed in the form of a CDF. Why do I have to invert the CDF myself? At the end the only thing you need is the normalization.
Describe alternatives you've considered.
The alternative seems to either invert in an approximate way, which brings numerical precision problems or do the integral numerically, which will be terribly inneficient and slow.
Additional context (e.g. screenshots, GIFs)
No response