The TruncatedNormalFactory sometimes fail

openturns / openturns

Uncertainty treatment library

Other

237 stars 93 forks source link

The following script creates a TruncatedNormalFactory from a sample:

import openturns as ot
ot.RandomGenerator.SetSeed(0)
a = -1
b=2.5
mu = 2.
sigma = 3.
distribution = ot.TruncatedNormal(mu,sigma,a,b)
distFactory = ot.TruncatedNormalFactory()    
sample = distribution.getSample(11)
newDistribution = distFactory.build(sample)
print("Built-in Fitted=",newDistribution)

This prints

Built-in Fitted= TruncatedNormal(mu = -16060.1, sigma = 561.146, a = -0.696535, b = 2.33702)

The bounds are correctly estimated, given that the sample min is -0.5701 and the sample max is 2.211. The mu and sigma parameters are completely wrong, with a very negative mu and a large sigma. In this particular case, the PDF cannot even be drawn:

>>> newDistribution.drawPDF()
Traceback (most recent call last):
  File "<ipython-input-67-bf36ab6df2ab>", line 1, in <module>
    newDistribution.drawPDF()
  File "/my/path/openturns/build/install/lib/python3.7/site-packages/openturns/model_copula.py", line 6265, in drawPDF
    return _model_copula.Distribution_drawPDF(self, *args)
TypeError: InvalidArgumentException : Error: cannot draw a PDF with xMax <= xMin, here xmin=-11500.3 and xmax=-11500.3

In this case the optimizer (TNC) uses a large number of function evaluations and only stops when the difference between two consecutive likelihood function values is close to zero.

The current implementation uses a clever scaling of the data and disables the estimation of the bounds of the TruncatedNormal. This is why I tested the basic MaximumLikelihoodFactory class:

MLfactory = ot.MaximumLikelihoodFactory(ot.TruncatedNormal())
newDistribution = MLfactory.build(sample)
print("ML Fitted      =",newDistribution)

this prints:

ML Fitted      = TruncatedNormal(mu = 0.00949447, sigma = 1.00589, a = -0.934979, b = 0.934979)

This cannot be good: the TruncatedNormal max is 0.9350 while the sample max is 2.211.

I assume that estimating the parameters of a TruncatedNormal is instrinsically difficult. Using the method of moments would be an option, but this is not so easy given that this requires to solve nonlinear equations (there are references on this topic).

This failure explains why the TruncatedNormal distribution performs so badly in #1061: each failed TruncatedNormal distribution fitting has a huge cost, which turns out to make the Kolmogorov-Smirnov test slow.

import openturns as ot import time # Generate a sample ot.RandomGenerator.SetSeed(0) N = ot.Normal() n = 100 sample = N.getSample(n) factory = ot.TruncatedNormalFactory() benchSize = 1000 print("buildMethodOfLikelihoodMaximization") start = time.time() for i in range(benchSize): distribution = factory.buildMethodOfLikelihoodMaximization(sample) end = time.time() elapsed1 = end - start print("Elapsed time = %s (s)" % (elapsed1)) print("buildMethodOfMoments") start = time.time() for i in range(benchSize): distribution = factory.buildMethodOfMoments(sample) end = time.time() elapsed2 = end - start print("Elapsed time = %s (s)" % (elapsed2)) print("buildMethodOfScaledLikelihoodMaximization") start = time.time() for i in range(benchSize): distribution = factory.buildMethodOfScaledLikelihoodMaximization(sample) end = time.time() elapsed3 = end - start print("Elapsed time = %s (s)" % (elapsed3)) def CreateBarPlot(origin, elapsed, label, color): data = [[1.0, elapsed]] myGraph = ot.Graph('Performance of TruncatedNormalFactory', 'Method', 'Time (s)', True, 'topright') myBarPlot = ot.BarPlot(data, origin, color, 'solid', 'dashed', label) myGraph.add(myBarPlot) return myGraph # Graph import openturns.viewer as otv graph = CreateBarPlot(1.0, elapsed1, "LikelihoodMax.", "red3") graph2 = CreateBarPlot(2.0, elapsed2, "Moments", "cornflowerblue") graph.add(graph2) graph3 = CreateBarPlot(3.0, elapsed3, "ScaledLMax.", "orange") graph.add(graph3) otv.View(graph)

openturns / openturns

The TruncatedNormalFactory sometimes fail #1275