When running some tests on simple models (2 dimensions and 3/4 components) I find that untruncated (i.e. sel_callback=None) performs better on these models than when I give pygmmis the selection function (i.e. sel_callback=selection_function).
The likelihood seems to decrease immediately when moving away from the k-means estimate.
However, if I disable convergence detection and let them run forever, then we see convergence for some split-merge runs. In fact, the log_L jumps around quite a bit before finally converging (sometimes).
The figures below show my problem, the ellipses are 1-sigma; observed data is in green, unobserved data in red.
Red ellipses are the kmeans estimate,
Black ellipses are the pygmmis estimate
Blue ellipses are the truth
With tolerance
Without tolerance
Loglikelihood
We need to detect convergence better otherwise it'll get stuck in a local minimum. Maybe a t-test for flat lines with tolerances?
When running some tests on simple models (2 dimensions and 3/4 components) I find that untruncated (i.e.
sel_callback=None
) performs better on these models than when I givepygmmis
the selection function (i.e.sel_callback=selection_function
).I've created a gist with my script here
The likelihood seems to decrease immediately when moving away from the k-means estimate. However, if I disable convergence detection and let them run forever, then we see convergence for some split-merge runs. In fact, the
log_L
jumps around quite a bit before finally converging (sometimes).The figures below show my problem, the ellipses are 1-sigma; observed data is in green, unobserved data in red. Red ellipses are the kmeans estimate, Black ellipses are the
pygmmis
estimate Blue ellipses are the truthWith tolerance
Without tolerance
Loglikelihood
We need to detect convergence better otherwise it'll get stuck in a local minimum. Maybe a t-test for flat lines with tolerances?