Open richelbilderbeek opened 6 years ago
P.S. I volunteer to add it, test it and port the documentation from LaTeX to roxygen2.
I've tested the reasoning below to be true, and checked with @RaphSch. Rmd and PDFs are here: pbd_sampling.zip. Note that I suggest MRS
(Most Recent Sister) and MDS
(Most Distant Sister) nowadays.
This is an incipient phylogeny in which everything works as expected:
P1
+---+---+---+ 1-4
+---+---+
| +---+---+===+ 3-3
-+
|
| +---+---+===+===+ 2-2
+===+
+===+===+===+===+ 1-1
Using the youngest
, that is, pick taxon 1-4
to represent species 1, results in the shorter branch length distribution:
P1: YOUNGEST (should have shorter branches)
+===+===+===+ 1-4 (YOUNGEST)
+===+===+
| +===+===+===+ 3-3
-+
|
+===+===+===+===+===+ 2-2
P1: OLDEST
+===+===+===+===+===+ 3-3
-+
|
| +===+===+===+===+ 2-2
+===+
+===+===+===+===+ 1-1 (OLDEST)
Now, we reverse the times at which species 2-2
and 3-3
started speciation, that is, when they
started being incipient species (note that they finish speciation in the same order):
P2
+---+---+---+---+ 1-4
+---+
| +---+---+---+===+ 3-3
-+
|
| +---+===+===+ 2-2
+===+===+
+===+===+===+ 1-1
Now we see that oldest
has the shorter branches:
P2: YOUNGEST (should have shorter branches)
+===+===+===+===+ 1-4 (YOUNGEST)
+===+
| +===+===+===+===+ 3-3
-+
|
+===+===+===+===+===+ 2-2
P2: OLDEST
+===+===+===+===+===+ 3-3
-+
|
| +===+===+===+ 2-2
+===+===+
+===+===+===+ 1-1 (OLDEST)
This is caused -more or less- by that the algorithm PBD::sampletree
orders taxons by their
speciation initiation time (3rd column in the L table).
To get a consistently shorter branch length distribution, I will
suggest to add mrca
(Most Recent Common Ancestor) and mdca
(Most Distance Common Ancestor)
as a sampling method to the PBD package:
P1
+---+---+---+ 1-4
+---+---+
| +---+---+===+ 3-3
-+
|
| +---+---+===+===+ 2-2
+===+
+===+===+===+===+ 1-1
P1: MRCA (shorter branch length distribution)
+===+===+===+ 1-4
+===+===+
| +===+===+===+ 3-3
-+
+===+===+===+===+===+ 2-2
P1: MDCA
+===+===+===+===+===+ 3-3
-+
| +===+===+===+===+ 2-2
+===+
+===+===+===+===+ 1-1
And for the other phylogeny:
P2
+---+---+---+---+ 1-4
+---+
| +---+---+---+===+ 3-3
-+
|
| +---+===+===+ 2-2
+===+===+
+===+===+===+ 1-1
P2: MRCA (shorter branch length distribution)
+===+===+===+===+===+ 3-3
-+
| +===+===+===+ 2-2
+===+===+
+===+===+===+ 1-1
P2: MDCA
+===+===+===+===+ 1-4
+===+
| +===+===+===+===+ 3-3
-+
+===+===+===+===+===+ 2-2
Currently,
sampletree
supports the sampling methodsrandom
,oldest
andyoungest
. I suggest to addshortest
(sampling the shortest branch lengths) andlongest
(sampling the longest branch lengths), as this would provide for consistently shorter and longer branch lengths.Problem
Imagine being interested in the effect of branch length distributions due to sampling. The two simplest and just as likely scenarios in which sampling has an effect are
P1=(A1, (B, A2));
andP2=(A1, B), A2));
. For P1, samplingyoungest
results in shorter branch lengths, where for P2 this results in longer branch lengths.See below for a detailed example.