Open mbaudin47 opened 4 years ago
The computePDF() method is not well-defined for distributions which are neither absolutely continuous nor discrete.
As we never planed to implement univariate continuous distributions which are not absolutely continuous (eg the distribution with CDF the Devil's staircase function), the only way for a distribution in OT to be neither discrete or continuous is to be a mixture of both. But for an absolutely continuous distribution PDF stands for Probability Density Function, and for a discrete distribution it stands for Probability Distribution Function. The only way to see a Probability Distribution Function as a Probability Density Function is to see it as a mixture of Dirac pseudo-functions, ie zero everywhere but on the at most countable points in the support of distribution where it takes an infinite value. Hard to draw it as a function. A possible way could be to superpose the graph of the PDF of the continuous component (scaled by its cumulated weights so it would not integrate to one) and to add a set of arrows in the spirit of the Dirac comb representation here The getContinuousComponent() and getDiscreteComponent() methods could be added to Distribution, allowing to separate both components from any distribution. The getContinuousComponent() would return a Uniform() distribution with zero weight as a convention for purely discrete distributions, and a Dirac() distribution with zero weight for purely continuous distributions, or throw an exception.
It is much harder to do that for multivariate distributions, as such a distribution could be singular without being discrete, eg the min copula. In this case, we could add a getSingularComponent() but what should be returned eg in the case of the Marshall-Olkin copula?
Indeed, the workaround I suggest is not serious, because the current height of the Diract is set to 1, which is rather arbitrary. Graphically, an arrow would indeed make more sense, where the arrow size has the length of the density of the higher mode in the distribution (so that it remains visible).
The context of this ticket is an attempt at understanding what happens to KDE when repeated values appear in the sample - see the following message on Stack:
However, this is not directly related to a study, so this might wait until we see more clearly the consequences and motivations for these changes in terms of use-cases.
We can draw the density of a Mixture which contains a Dirac:
but it produces:
Hence, the Dirac part of the PDF is not shown, which is
a pitya bug.We can use the following workaround:
which produces:
but this seems like a bug to me.