Open mbaudin47 opened 5 years ago
In order to improve the performances of the Kolmogorov test, I implemented several new build methods in:
https://github.com/mbaudin47/openturns/commits/FixChaos
with 3 build methods:
Below is a bench of the performance.
import openturns as ot
import time
# Generate a sample
ot.RandomGenerator.SetSeed(0)
N = ot.Normal()
n = 100
sample = N.getSample(n)
factory = ot.TruncatedNormalFactory()
benchSize = 1000
print("buildMethodOfLikelihoodMaximization")
start = time.time()
for i in range(benchSize):
distribution = factory.buildMethodOfLikelihoodMaximization(sample)
end = time.time()
elapsed1 = end - start
print("Elapsed time = %s (s)" % (elapsed1))
print("buildMethodOfMoments")
start = time.time()
for i in range(benchSize):
distribution = factory.buildMethodOfMoments(sample)
end = time.time()
elapsed2 = end - start
print("Elapsed time = %s (s)" % (elapsed2))
print("buildMethodOfScaledLikelihoodMaximization")
start = time.time()
for i in range(benchSize):
distribution = factory.buildMethodOfScaledLikelihoodMaximization(sample)
end = time.time()
elapsed3 = end - start
print("Elapsed time = %s (s)" % (elapsed3))
def CreateBarPlot(origin, elapsed, label, color):
data = [[1.0, elapsed]]
myGraph = ot.Graph('Performance of TruncatedNormalFactory',
'Method',
'Time (s)', True, 'topright')
myBarPlot = ot.BarPlot(data, origin, color,
'solid', 'dashed', label)
myGraph.add(myBarPlot)
return myGraph
# Graph
import openturns.viewer as otv
graph = CreateBarPlot(1.0, elapsed1,
"LikelihoodMax.", "red3")
graph2 = CreateBarPlot(2.0, elapsed2,
"Moments", "cornflowerblue")
graph.add(graph2)
graph3 = CreateBarPlot(3.0, elapsed3,
"ScaledLMax.", "orange")
graph.add(graph3)
otv.View(graph)
This shows that the 2 new build methods are not faster.
The following script creates a TruncatedNormalFactory from a sample:
This prints
The bounds are correctly estimated, given that the sample min is -0.5701 and the sample max is 2.211. The mu and sigma parameters are completely wrong, with a very negative mu and a large sigma. In this particular case, the PDF cannot even be drawn:
In this case the optimizer (TNC) uses a large number of function evaluations and only stops when the difference between two consecutive likelihood function values is close to zero.
The current implementation uses a clever scaling of the data and disables the estimation of the bounds of the TruncatedNormal. This is why I tested the basic MaximumLikelihoodFactory class:
this prints:
This cannot be good: the TruncatedNormal max is 0.9350 while the sample max is 2.211.
I assume that estimating the parameters of a TruncatedNormal is instrinsically difficult. Using the method of moments would be an option, but this is not so easy given that this requires to solve nonlinear equations (there are references on this topic).
This failure explains why the TruncatedNormal distribution performs so badly in #1061: each failed TruncatedNormal distribution fitting has a huge cost, which turns out to make the Kolmogorov-Smirnov test slow.