Penalize diagonal of transition matrix to be closer to 1/2

The performance of our estimator in small samples without this penalization is quite poor. The BHV distance prefers choosing random trees or zero tree. Even though we've already added some penalization for target rates, it not enough. The reason we were estimating super long leaf branches and super short internal branches in the tree was because MLE was trying to decide: (1) to have long branches for internal branches, where many targets are still active (2) to have long branches for leaf branches, where fewer targets are still active. To maximize the likelihood, (2) is the preferred choice. This results in crazy long leaf branches and crazy short internal branches. In reality, at least in our simulation settings, the true internal branches are not near zero.

In addition, one can think of a simplified problem. In a simple situation where there is a single leaf node and a single target (but possibly many idpt barcodes), our estimate of the branch length (if total time is unknown but rate is known) or our estimate of the rate (if total time is known) is based on the number of times we observe an event. This is essentially a binary data problem. You can calculate the MLE via simple math -- it's something like exp(-lambda * t) = . When we don't have a lot of total possible events/when we don't have a lot of observed events, this is similar to the rare-event/binary-data-separation problem. This problem is a known issue in statistics. Solutions include Firth regression, adding 1/2 to each entry in the contingency matrix, ridge regression, etc. The problem is that the MLE is biased away from the probability 1/2. So these methods pull the MLE in a little towards the middle. Since Firth regression is hard to compute (the objective function is the sum of the log lik and the determinant of the hessian matrix) and it's not clear how ridge regression on our current parameterization will pull probabilities towards zero, we try out a simpler approach: we penalize the diagonal of the probability matrices along each branch (exp(Mt)) to be close to 1/2. We need to tune a penalty parameter to decide how much to pull the diagonal towards 1/2.

This has drastically improved the performance of our estimated trees. Our estimated trees are much closer in BHV distance compared to other stupid trees (i.e. random branch lengths, zero branch lengths, etc). Also the leaf branches are no longer super super large.

matsengrp / gapml

Penalize diagonal of transition matrix to be closer to 1/2 #131