xavierdidelot / TransPhylo

Reconstruction of transmission trees using genomic data
http://xavierdidelot.github.io/TransPhylo/
GNU General Public License v2.0
60 stars 22 forks source link

underflow on large trees #4

Closed gtonkinhill closed 6 years ago

gtonkinhill commented 6 years ago

Hi Xavier,

I've been running into some underflow issues when running TransPhylo on largish trees. I'm attaching an example along with a go I had at fixing it.

It seems that the alpha functions in the c++ code cause some issues with very low probabilities. Adding a few lines to keep things in log space seemed to fix the issue.

I've also noticed that the program spends a very large amount of time optimising the MCMC start point for some trees. Is it appropriate to set the optiStart option to FALSE for these?

Thanks for making a very useful package!

Best

Gerry

test_data.zip

xavierdidelot commented 6 years ago

Hi Gerry,

Thanks very much for reporting and solving this problem! There was only a little problem with your code in the case where the r parameter of the NegBin offspring distribution is close to zero, in which case there is an exact solution (cf lines 73 and 106 of src/probTTree.cpp). I've corrected this and committed the changes.

Yes, you're right that the procedure to optimise the MCMC starting point can sometimes try a bit too hard. You can of course start from a non-optimised point, but you might then find that it takes a long time for the MCMC to converge. Alternatively you could reduce the number of attempts at optimising on line 53 of R/makeCTreeFromPTree.R from 100 to 10 or even 2 or 1.

Best wishes, Xavier