popgenmethods / momi2

Infer demographic history with the Moran model
GNU General Public License v3.0
47 stars 11 forks source link

Wrong KL value returned by optimize #45

Open yss1 opened 3 years ago

yss1 commented 3 years ago

Hi Jack. I noticed that sometimes the KL value returned by optimize is different from the KL reported for the final iteration of find_mle. Perhaps replacing res["kl_divergence"] = res.fun with res["kl_divergence"] =self.kl_div() in optimize would fix this problem?

jackkamm commented 3 years ago

In principle those 2 should be the same so it would be good to understand why this happens.

Do you have any more information about when or how often this occurs, and how large the discrepancy is? The more specific and reproducible the better, but general information about the settings in which you observe this issue would still be useful. When you do run into the problem again, please save the logged output as well as the value of kl_div().

Glancing through the code, my best guess about where the error may occur is here: https://github.com/popgenmethods/momi2/blob/7d056c6953500fdd22700ae1ba742202be731f20/momi/likelihood.py#L251

In particular, interally kl_div() is wrapped by the autograd function value_and_grad during optimization, so possibly numerical errors in the autograd computation could propagate. This is just a wild guess though.

yss1 commented 3 years ago

I have seen this happen just a few times when analyzing rather complex models.

KL for the final iteration of find_mle reported in the log file = 0.031423908091051084 KL returned by optimize = 0.037972434327673873

jackkamm commented 3 years ago

Another possible reason -- maybe this callback might not be called after the last step the scipy optimizer:

https://github.com/popgenmethods/momi2/blob/7d056c6953500fdd22700ae1ba742202be731f20/momi/likelihood.py#L233

One way to check whether this is the cause is to check if if the final res.x is the same as the final x value that was output in the log.

If this is the case, then I think your solution would indeed be the correct thing to do.

jackkamm commented 3 years ago

KL for the final iteration of find_mle reported in the log file = 0.031423908091051084 KL returned by optimize = 0.037972434327673873

If you're easily able to reproduce this, please also check what the result is if calling kl_div() at the final result, to see which value it gives.

I'm also wondering if for whatever reason the optimizer returns the final value rather than an optimum value it previously encountered, i.e. it "overshoots" the optimum and then returns the final value.

If this is the cause, we should also change the res.x as well as res.fun, to be the best encountered value along the trajectory of results.

yss1 commented 3 years ago

The parameter values in the log file corresponding to the final iteration were exactly the same as the parameter values for the model returned by the optimize function. Only the KL divergence values differed.