pmelchior / pygmmis

Gaussian mixture model for incomplete (missing or truncated) and noisy data
MIT License
98 stars 22 forks source link

Arbitrary termination conceals unstable EM steps? #10

Closed philastrophist closed 4 years ago

philastrophist commented 5 years ago

I have found that --for some truncated models-- the log-likelihood decreases immediately from the starting point and EM steps taken by GMM.fit can make things worse. If I remove the convergence conditions for these models and just run until maxiter is reached, the EM steps always result in a lower logL. I was under the impression that EM is supposed to guarantee that the likelihood increases. For the model I use here (where green is the ground truth):

aggressive

So EM appears not to work for this model, the likelihood always decreases despite the fact that I gave it the k-means estimate based on all data, not just the observed.

This doesn't happen for all models. If I use a less aggressive truncation, then the model stablises as it should: not-aggressive

My model is here and produces the plots above.

================ I discovered this effect after implementing my own convergence detection technique to deal with problems such as the one I've seen in #11. The log-likelihood decreases but then goes back up higher than before so the original convergence checks would terminate early.

logL

My plan was to fix this with a gradient test (my feature/convergence - #12 ) branch (which also includes tools to visualise what's happening and a backend to store the EMSteps). This method has fixed that problem but it can't solve this one.

Comments and suggestions would be really helpful!

Thanks

pmelchior commented 4 years ago

duplicate of #11