openturns / openturns

Uncertainty treatment library
http://openturns.github.io/openturns/latest/index.html
Other
236 stars 93 forks source link

A Mixture with a Dirac does not draw its PDF correctly #1489

Open mbaudin47 opened 4 years ago

mbaudin47 commented 4 years ago

We can draw the density of a Mixture which contains a Dirac:

import openturns as ot
distribution = ot.Mixture([ot.Dirac(-3.0), ot.Normal()], [0.5, 0.5])
distribution.drawPDF()

but it produces:

image

Hence, the Dirac part of the PDF is not shown, which is a pity a bug.

We can use the following workaround:

graph = distribution.drawPDF()
graph.add(ot.Dirac(-3.0).drawPDF())

which produces:

image

but this seems like a bug to me.

regislebrun commented 4 years ago

The computePDF() method is not well-defined for distributions which are neither absolutely continuous nor discrete.

As we never planed to implement univariate continuous distributions which are not absolutely continuous (eg the distribution with CDF the Devil's staircase function), the only way for a distribution in OT to be neither discrete or continuous is to be a mixture of both. But for an absolutely continuous distribution PDF stands for Probability Density Function, and for a discrete distribution it stands for Probability Distribution Function. The only way to see a Probability Distribution Function as a Probability Density Function is to see it as a mixture of Dirac pseudo-functions, ie zero everywhere but on the at most countable points in the support of distribution where it takes an infinite value. Hard to draw it as a function. A possible way could be to superpose the graph of the PDF of the continuous component (scaled by its cumulated weights so it would not integrate to one) and to add a set of arrows in the spirit of the Dirac comb representation here The getContinuousComponent() and getDiscreteComponent() methods could be added to Distribution, allowing to separate both components from any distribution. The getContinuousComponent() would return a Uniform() distribution with zero weight as a convention for purely discrete distributions, and a Dirac() distribution with zero weight for purely continuous distributions, or throw an exception.

It is much harder to do that for multivariate distributions, as such a distribution could be singular without being discrete, eg the min copula. In this case, we could add a getSingularComponent() but what should be returned eg in the case of the Marshall-Olkin copula?

mbaudin47 commented 4 years ago

Indeed, the workaround I suggest is not serious, because the current height of the Diract is set to 1, which is rather arbitrary. Graphically, an arrow would indeed make more sense, where the arrow size has the length of the density of the higher mode in the distribution (so that it remains visible).

The context of this ticket is an attempt at understanding what happens to KDE when repeated values appear in the sample - see the following message on Stack:

https://stackoverflow.com/questions/61797760/seaborn-kdeplot-not-enough-variation-in-data/61853081#61853081

However, this is not directly related to a study, so this might wait until we see more clearly the consequences and motivations for these changes in terms of use-cases.