revbayes / revbayes

Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language
http://revbayes.com
GNU General Public License v3.0
56 stars 25 forks source link

[BUG] Problem with mvCollapseExpandFossilBranch #423

Closed davidcerny closed 5 months ago

davidcerny commented 8 months ago

Describe the bug This is a problem I first mentioned on Slack in November 2023, but I wanted to create a GitHub issue for it as well to make sure it doesn't slip through the cracks. I also followed @bredelings's advice to run the example dataset below with multiple seeds to find the one that crashes first.

Recently, I tried to run a tip-dating analysis on a small morphological matrix under the FBD model and a white-noise relaxed clock, and noticed problematic behavior from the mvCollapseExpandFossilBranch() move. When the DEBUG_MCMC preprocessor macro is enabled, the analysis crashes with the following error after ~380 iterations (~96,000 moves):

   Error:       Issue in 'CollapseExpandFossilBranch' on 'fbd_tree' after move because posterior of -1889.28 and -inf/-inf. The move was rejected.

Without the macro, the analysis will seemingly run just fine, but the move will never be accepted, and no ancestors will ever be sampled:

            Name             | Param         |  Weight  |  Tried  | Accepted | Acc. Ratio | Parameters
======================================================================================================
CollapseExpandFossilBranch     fbd_tree         6.0000    2399416   0          0.0000

As far as I can tell, this problem is separate from issue #401, another recently described bug having to do with sampled ancestors under the FBD model.

To reproduce This zipped directory contains (1) a character matrix, (2) a taxon data file with tip age ranges, (3) a starting tree, and (4) a script to run the analysis. One absolute path has to be changed in the script.

Expected behavior The mvCollapseExpandFossilBranch() move should be accepted at least some of the time.

Computer info Red Hat Enterprise Linux 8.4 (Ootpa), RevBayes 1.2.2, development branch, commit rapture-2364-ga6a40c (two PRs behind the tip of the branch).

willgearty commented 7 months ago

I noticed this in an FBD analysis I recently ran, not a single move was ever accepted.

bjoelle commented 5 months ago

So after investigating, I think everything is behaving correctly, but the default removal probability in BDSTP is set to 1 so unless you set it explicitly r=0 it will not allow SAs if I set r to 0 in your script I do get a non-zero amount of SAs I'm guessing the default used to be 0 and then changed somehow ? funnily enough the tutorial (https://revbayes.github.io/tutorials/fbd/fbd_specimen.html) is flat out wrong because it tells you to set r=1 for an FBD analysis which is 1) not doing anything because it's the default 2) the exact opposite of what it should be

@davidcerny can you check that I have it right ? if so we need to decide whether to change the default back and either way fix the doc

willgearty commented 5 months ago

I can confirm that using r=1 for BDSTP results in no SAs for my own analyses, whereas r=0 results in a non-zero number of SAs. It looks like using FBDP always has SAs (no r option).

bjoelle commented 5 months ago

thank you for testing this ! thinking about this more I wonder if there is a way to detect that you have an incompatibility in your setup like that and give a warning to the user - the issue is that it's two different components but maybe the mvCollapseExpandFossilBranch could be in charge of checking that the associated tree does allow SAs

davidcerny commented 5 months ago

Sorry for the delay @bjoelle – I can confirm that setting r=0 inside dnBDSTP indeed fixes the problem for the dataset and RevBayes version I originally reported in this issue: a variable (and generally nonzero) number of ancestors is sampled throughout the analysis.

That's a really interesting observation about the differences between dnBDSTP and dnFBDP. In the latter, the removal probability is indeed hardcoded to 0 at the Rev level: https://github.com/revbayes/revbayes/blob/21ab167ad3046b7d049dac152089732e0fda08fc/src/revlanguage/distributions/phylogenetics/tree/Dist_FBDP.cpp#L137 This makes perfect sense given the original paper by Gavryushkina et al. (2014), which explicitly states that "the fossilized birth-death process [is ]the process with $\rho$-sampling and zero removal probability". It also makes sense to use dnBDSTP for epidemiology and dnFBDP for paleontology – the "T" stands for "treatment" and the "F" stands for "fossilized", after all. But in that case, we need to stop calling them "aliases" (as in the documentation) or "synonyms" (as in NEWS.md) – clearly they are not that if choosing one or the other makes a difference!

As for the idea of mvCollapseExpandFossilBranch checking whether sampled ancestors are actually enabled, I like it a lot; I'm just not sure how to implement it. There is no way for a move to access the values of the arguments passed to the distribution, right? Would this require a new flag in TimeTree?

bjoelle commented 5 months ago

so there are two levels of issue here the first is that we have actually 3 different versions of the thing, dnFBDP, dnBDSTP and dnPhylodynamicBDP that all appear to do exactly the same thing but with different removal probabilities, but they don't share code which is super inefficient (and I'm pretty sure a c/p from dnPhylodynamicBDP is actually what led to the change in behaviour in dnBDSTP which led to this whole mess) so ideally I would change the architecture so dnBDSTP is the master class and doesn't have a default, and then both others can derive from it but run with defaults r=0 and r=1 respectively second issue is that move checking idea, which I think would need to call the distribution not the tree, but I haven't looked into it plus as you noted fixing the documentation and tutorial

I am planning on working on this further next week hopefully

davidcerny commented 5 months ago

That all makes sense to me. Thank you so much for figuring this out Joëlle!