MLE getting stuck for hours

rmcgibbo commented 12 years ago

Anyone seeing this type of thing when running the MLE?

Log-Likelihood after 12145 function evaluations: -2623598.26493
Log-Likelihood after 12178 function evaluations: -2623598.26493
Log-Likelihood after 12211 function evaluations: -2623598.26493
Log-Likelihood after 12244 function evaluations: -2623598.26493
Log-Likelihood after 12277 function evaluations: -2623598.26493
Log-Likelihood after 12310 function evaluations: -2623598.26493
Log-Likelihood after 12343 function evaluations: -2623598.26493
Log-Likelihood after 12376 function evaluations: -2623598.26493
Log-Likelihood after 12409 function evaluations: -2623598.26493
Log-Likelihood after 12442 function evaluations: -2623598.26493
Log-Likelihood after 12475 function evaluations: -2623598.26493

I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?

kyleabeauchamp commented 12 years ago

To maintain the same error rate, I had to increase the convergence criteria when I changed from my iterative MLE to Lutz's minimization MLE. Perhaps we were better off keeping my code...

On 09/03/2012 11:14 AM, Robert McGibbon wrote:

Anyone seeing this?

Log-Likelihood after 12145 function evaluations: -2623598.26493 Log-Likelihood after 12178 function evaluations: -2623598.26493 Log-Likelihood after 12211 function evaluations: -2623598.26493 Log-Likelihood after 12244 function evaluations: -2623598.26493 Log-Likelihood after 12277 function evaluations: -2623598.26493 Log-Likelihood after 12310 function evaluations: -2623598.26493 Log-Likelihood after 12343 function evaluations: -2623598.26493 Log-Likelihood after 12376 function evaluations: -2623598.26493 Log-Likelihood after 12409 function evaluations: -2623598.26493 Log-Likelihood after 12442 function evaluations: -2623598.26493 Log-Likelihood after 12475 function evaluations: -2623598.26493

I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16.

rmcgibbo commented 12 years ago

Maybe we can bring Lutz in on this?

I haven't looked at the code closely, but I can instrument it a bit more to see where this is coming from. The first step is just printing more digits of that log likelihood number, to see what kind of fluctuations were looking at.

For instance, if the optimizer is trapped on some kind of cyclic thing -- which I think can happen in these gradient optimizers -- we can add some damping.

Another possibility, though more drastic, is to try L-BFGS-B instead of truncated conjugate newton.

Sent from my iPod

On Sep 3, 2012, at 12:20 PM, kyleabeauchamp notifications@github.com wrote:

To maintain the same error rate, I had to increase the convergence criteria when I changed from my iterative MLE to Lutz's minimization MLE. Perhaps we were better off keeping my code...

On 09/03/2012 11:14 AM, Robert McGibbon wrote:

Anyone seeing this?

Log-Likelihood after 12145 function evaluations: -2623598.26493 Log-Likelihood after 12178 function evaluations: -2623598.26493 Log-Likelihood after 12211 function evaluations: -2623598.26493 Log-Likelihood after 12244 function evaluations: -2623598.26493 Log-Likelihood after 12277 function evaluations: -2623598.26493 Log-Likelihood after 12310 function evaluations: -2623598.26493 Log-Likelihood after 12343 function evaluations: -2623598.26493 Log-Likelihood after 12376 function evaluations: -2623598.26493 Log-Likelihood after 12409 function evaluations: -2623598.26493 Log-Likelihood after 12442 function evaluations: -2623598.26493 Log-Likelihood after 12475 function evaluations: -2623598.26493

I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16.

— Reply to this email directly or view it on GitHub.

kyleabeauchamp commented 12 years ago

We might want to consider switching back to my code. It was half as many lines of code and didn't have these kinds of issues...

On 09/03/2012 12:52 PM, Robert McGibbon wrote:

Maybe we can bring Lutz in on this?

I haven't looked at the code closely, but I can instrument it a bit more to see where this is coming from. The first step is just printing more digits of that log likelihood number, to see what kind of fluctuations were looking at.

For instance, if the optimizer is trapped on some kind of cyclic thing -- which I think can happen in these gradient optimizers -- we can add some damping.

Another possibility, though more drastic, is to try L-BFGS-B instead of truncated conjugate newton.

Sent from my iPod

On Sep 3, 2012, at 12:20 PM, kyleabeauchamp notifications@github.com wrote:

To maintain the same error rate, I had to increase the convergence criteria when I changed from my iterative MLE to Lutz's minimization MLE. Perhaps we were better off keeping my code...

On 09/03/2012 11:14 AM, Robert McGibbon wrote:

Anyone seeing this?

Log-Likelihood after 12145 function evaluations: -2623598.26493 Log-Likelihood after 12178 function evaluations: -2623598.26493 Log-Likelihood after 12211 function evaluations: -2623598.26493 Log-Likelihood after 12244 function evaluations: -2623598.26493 Log-Likelihood after 12277 function evaluations: -2623598.26493 Log-Likelihood after 12310 function evaluations: -2623598.26493 Log-Likelihood after 12343 function evaluations: -2623598.26493 Log-Likelihood after 12376 function evaluations: -2623598.26493 Log-Likelihood after 12409 function evaluations: -2623598.26493 Log-Likelihood after 12442 function evaluations: -2623598.26493 Log-Likelihood after 12475 function evaluations: -2623598.26493

I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16#issuecomment-8245885.

rmcgibbo commented 12 years ago

Start by doing it in a branch? Then I can test them side by side.

schwancr commented 12 years ago

I've seen this in some lagtimes, but not all of them.

rmcgibbo commented 12 years ago

The github "comment by email" seemed to cut off the other paragraphs of my response. Here's what I meant to say

Maybe we can bring Lutz in on this?

I haven't looked at the code closely, but I can instrument it a bit more to see where this is coming from. The first step is just printing more digits of that log likelihood number, to see what kind of fluctuations were looking at.

For instance, if the optimizer is trapped on some kind of cyclic thing -- which I think can happen in these gradient optimizers -- we can add some damping.

Another possibility, though more drastic, is to try L-BFGS-B instead of truncated conjugate newton.

kyleabeauchamp commented 12 years ago

So I recently wrote some entropy maximization code. I played around with several different implementations of the objective function / constraints (e.g. normalization) and found that the following gave fast and robust convergence:

Instead of working with populations and a constrained problem, I worked with non-normalized log populations. That is, p = exp(u) / Z, where u is a free energy. The addition of Z leads to a few extra chain rule terms in the derivatives, but they weren't too bad to work with.
I used the fmin_l_bfgs_b without any bounds or constraints.

I think this might be the way to go for the MLE estimator, but I haven't actually implemented it.

kyleabeauchamp commented 12 years ago

I do not think that using L-BFGS-B will work with the constrained minimization problem with raw populations / Tij. The issue is that the scaling of the likelihood function always seems to cause issues with the line search. This was my experience with the maxent code, and I recall similar behaviour with the MLE stuff as well.

Note that the maxent code is not for MSM stuff, but for another project. It was written for the case of a population vector, not a normalized transition matrix. However, I think the lessons learned still apply.