Closed rmcgibbo closed 11 years ago
To maintain the same error rate, I had to increase the convergence criteria when I changed from my iterative MLE to Lutz's minimization MLE. Perhaps we were better off keeping my code...
On 09/03/2012 11:14 AM, Robert McGibbon wrote:
Anyone seeing this?
Log-Likelihood after 12145 function evaluations: -2623598.26493 Log-Likelihood after 12178 function evaluations: -2623598.26493 Log-Likelihood after 12211 function evaluations: -2623598.26493 Log-Likelihood after 12244 function evaluations: -2623598.26493 Log-Likelihood after 12277 function evaluations: -2623598.26493 Log-Likelihood after 12310 function evaluations: -2623598.26493 Log-Likelihood after 12343 function evaluations: -2623598.26493 Log-Likelihood after 12376 function evaluations: -2623598.26493 Log-Likelihood after 12409 function evaluations: -2623598.26493 Log-Likelihood after 12442 function evaluations: -2623598.26493 Log-Likelihood after 12475 function evaluations: -2623598.26493 I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?
— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16.
Maybe we can bring Lutz in on this?
I haven't looked at the code closely, but I can instrument it a bit more to see where this is coming from. The first step is just printing more digits of that log likelihood number, to see what kind of fluctuations were looking at.
For instance, if the optimizer is trapped on some kind of cyclic thing -- which I think can happen in these gradient optimizers -- we can add some damping.
Another possibility, though more drastic, is to try L-BFGS-B instead of truncated conjugate newton.
Sent from my iPod
On Sep 3, 2012, at 12:20 PM, kyleabeauchamp notifications@github.com wrote:
To maintain the same error rate, I had to increase the convergence criteria when I changed from my iterative MLE to Lutz's minimization MLE. Perhaps we were better off keeping my code...
On 09/03/2012 11:14 AM, Robert McGibbon wrote:
Anyone seeing this?
Log-Likelihood after 12145 function evaluations: -2623598.26493 Log-Likelihood after 12178 function evaluations: -2623598.26493 Log-Likelihood after 12211 function evaluations: -2623598.26493 Log-Likelihood after 12244 function evaluations: -2623598.26493 Log-Likelihood after 12277 function evaluations: -2623598.26493 Log-Likelihood after 12310 function evaluations: -2623598.26493 Log-Likelihood after 12343 function evaluations: -2623598.26493 Log-Likelihood after 12376 function evaluations: -2623598.26493 Log-Likelihood after 12409 function evaluations: -2623598.26493 Log-Likelihood after 12442 function evaluations: -2623598.26493 Log-Likelihood after 12475 function evaluations: -2623598.26493 I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?
— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16.
— Reply to this email directly or view it on GitHub.
We might want to consider switching back to my code. It was half as many lines of code and didn't have these kinds of issues...
On 09/03/2012 12:52 PM, Robert McGibbon wrote:
Maybe we can bring Lutz in on this?
I haven't looked at the code closely, but I can instrument it a bit more to see where this is coming from. The first step is just printing more digits of that log likelihood number, to see what kind of fluctuations were looking at.
For instance, if the optimizer is trapped on some kind of cyclic thing -- which I think can happen in these gradient optimizers -- we can add some damping.
Another possibility, though more drastic, is to try L-BFGS-B instead of truncated conjugate newton.
Sent from my iPod
On Sep 3, 2012, at 12:20 PM, kyleabeauchamp notifications@github.com wrote:
To maintain the same error rate, I had to increase the convergence criteria when I changed from my iterative MLE to Lutz's minimization MLE. Perhaps we were better off keeping my code...
On 09/03/2012 11:14 AM, Robert McGibbon wrote:
Anyone seeing this?
Log-Likelihood after 12145 function evaluations: -2623598.26493 Log-Likelihood after 12178 function evaluations: -2623598.26493 Log-Likelihood after 12211 function evaluations: -2623598.26493 Log-Likelihood after 12244 function evaluations: -2623598.26493 Log-Likelihood after 12277 function evaluations: -2623598.26493 Log-Likelihood after 12310 function evaluations: -2623598.26493 Log-Likelihood after 12343 function evaluations: -2623598.26493 Log-Likelihood after 12376 function evaluations: -2623598.26493 Log-Likelihood after 12409 function evaluations: -2623598.26493 Log-Likelihood after 12442 function evaluations: -2623598.26493 Log-Likelihood after 12475 function evaluations: -2623598.26493 I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?
— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16.
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16#issuecomment-8245885.
Start by doing it in a branch? Then I can test them side by side.
I've seen this in some lagtimes, but not all of them.
The github "comment by email" seemed to cut off the other paragraphs of my response. Here's what I meant to say
Maybe we can bring Lutz in on this?
I haven't looked at the code closely, but I can instrument it a bit more to see where this is coming from. The first step is just printing more digits of that log likelihood number, to see what kind of fluctuations were looking at.
For instance, if the optimizer is trapped on some kind of cyclic thing -- which I think can happen in these gradient optimizers -- we can add some damping.
Another possibility, though more drastic, is to try L-BFGS-B instead of truncated conjugate newton.
So I recently wrote some entropy maximization code. I played around with several different implementations of the objective function / constraints (e.g. normalization) and found that the following gave fast and robust convergence:
I think this might be the way to go for the MLE estimator, but I haven't actually implemented it.
I do not think that using L-BFGS-B will work with the constrained minimization problem with raw populations / Tij. The issue is that the scaling of the likelihood function always seems to cause issues with the line search. This was my experience with the maxent code, and I recall similar behaviour with the MLE stuff as well.
Note that the maxent code is not for MSM stuff, but for another project. It was written for the case of a population vector, not a normalized transition matrix. However, I think the lessons learned still apply.
If people want this one fixed, could they please upload an example of the slow convergence?
I might take another stab at fixing the MLE code.
What are you looking? I think I can find you a transition matrix of less than 100 dimensions -- will that work?
I just need any example of a count matrix that gets "stuck".
On 10/30/2012 05:53 PM, Robert McGibbon wrote:
What are you looking? I think I can find you a transition matrix of less than 100 dimensions -- will do it?
— Reply to this email directly or view it on GitHub https://github.com/SimTk/msmbuilder/issues/16#issuecomment-9928834.
So I converted the MLE problem to minimize in "log space"--that is, X{ij} = (1/Z) exp(u{ij}). This makes it an unbounded problem, which could have advantages.
I've got code that is object oriented, correct, and almost as fast as the current MLE. The question is whether it is "more robust".
One argument for using my new code is that I don't do any of this "restarting" nonsense. I think that working in logspace makes that procedure unnecessary, but I can't say for sure.
I also have a latex document that documents the calculation of the log likelihood and its gradient. If we switch to this, I think we should include a directory "Notes" in the Docs tree, and reference that PDF in our docstrings. In general, I think any calculation that is "nonobvious" should probably have some sort of latex writeup, either in the published literature or in a PDF in MSMBuilder.
See https://gist.github.com/3991712 for example test code.
Any update on test cases that are failing with the current code?
I found a test case that seems to converge extremely slowly. Alanine dipeptide, 1,000 ns, frames every 500 fs, 80 microstates with hybrid clustering.
It seems that the large number of counts may have something to do with the convergence issues--I think this is consistent with what others have seen.
So my rewrite of the MLE code does not stall on the previous fail case. My new code is also much cleaner, IMHO.
I'm going to take that as evidence that I should prepare my new code for an eventual merge.
I think this issue was fixed with issue #119
Reopen if someone sees slow convergence again.
Anyone seeing this type of thing when running the MLE?
I'm getting this type of printout a lot. Maybe the convergence criteria is set too high?