philastrophist / pygmmis

Gaussian mixture model for incomplete (missing or truncated) and noisy data
MIT License
0 stars 0 forks source link

Early termination with truncated samples #1

Open philastrophist opened 5 years ago

philastrophist commented 5 years ago

When running some tests on simple models (2 dimensions and 3/4 components) I find that untruncated (i.e. sel_callback=None) performs better on these models than when I give pygmmis the selection function (i.e. sel_callback=selection_function).

I've created a gist with my script here

The likelihood seems to decrease immediately when moving away from the k-means estimate. However, if I disable convergence detection and let them run forever, then we see convergence for some split-merge runs. In fact, the log_L jumps around quite a bit before finally converging (sometimes).

The figures below show my problem, the ellipses are 1-sigma; observed data is in green, unobserved data in red. Red ellipses are the kmeans estimate, Black ellipses are the pygmmis estimate Blue ellipses are the truth

With tolerance pygmmis

Without tolerance pygmmis-converged

Loglikelihood

log_like

We need to detect convergence better otherwise it'll get stuck in a local minimum. Maybe a t-test for flat lines with tolerances?

philastrophist commented 5 years ago

https://github.com/philastrophist/pygmmis/tree/feature/convergence