Open benmarwick opened 3 years ago
Dear @benmarwick, thank you so much for sharing your code! On first glance it looks very cryptic to me, but I'm sure your annotations will help me figuring it out eventually! Inferring phylogenies using continuous traits is something that I had already given up on, so this is a huge boost in motivation - thank you!
You're warmly welcome! I have just now updated my first post with additional inline comments as I learn more about the RevBayes functions. Here's a summary of the most important parts that I understand so far, largely taken word-for-word from Wright and Warnock 2020. As we see in the image below, for Bayesian inference of phylogenies, in addition to our data, we need to make decisions about three important inputs: (1) a substitution model to describe the evolution of characters (by what process did our measurement variables change in the past?), a clock model to describe the distribution of evolutionary rates across the tree (how fast did things change?), and (3) the tree model to describes the distribution of speciation events across the tree.
This describes how phylogenetic characters in the dataset evolve; how character change accumulates over time, leading to the observed phylogenetic data. Molecular sequence data has a big advantage here because there seem to be many well understood models. One I see often in the literature is the the general time reversible model (GTR), but its for nucleotides only. For discrete morphological characters, the Mk model is popular, standing for 'Markov k', This means that the probability of changing from one state to another depends only on the current state, and not on what has come before, and assumes that every state is equally likely to change to any other state. For continuous character data such as we are looking at here, we have three options: Brownian motion, or Ornstein–Uhlenbeck, or Lévy processes. In the code above we are using Brownian motion (cf. the RevBayes function dnPhyloBrownianREML
), and in the reading I've done so far, this seems very common (e.g. 1, 2, and see Wright (2019) for an encouraging discussion of Bayesian inference of phylogenies using continuous characters and Brownian motion). Here's the RevBayes tutorial on Brownian motion models.
The clock model describes the way the rate of character change varies, or does not vary, across the tree. In the code above this are specified by branchRates
(and maybe also siteRates
, I'm not sure). We have a strict (or global) clock model, where all rates are assumed constant, but RevBayes allows for rate variation across the tree in several interesting ways (this seems to be an example of branch-specific rates). Here's the RevBayes tutorial on implementing different clock models.
We are using a birth-death process to model the probability of each species (i.e. lithic variant) either giving birth (speciating) or dying (going extinct). This is our We denote the per-lineage birth rate as λ and the per-lineage death rate as μ. We need to specify a distribution to draw a speciation rate from (λ), and a distribution to draw from for an extinction rate (μ); these are important priors for the MCMC process to search through. In the code above our birth-death model is specified in the function dnBDP
.
Other interesting tree models to consider are the fossilized birth-death (FBD) process, which allows us to set dated specimens on tips or internal branches. This sounds like an exciting way to richly use lithics with well constrained ages. Coalescent tree models also sound intriguing, especially the multi-species coalescent model, which explicitly models the evolution of genes, in combination with the speciation process. I think this is promising if we consider certain morphological attributes akin to genes, and overall shapes as species. These could have different trees, just as genes and species sometimes have different trees. Another intriguing thing about the multi-species coalescent model is that it is also frequently mentioned in the literature on phylogenetic networks, hybridisation and reticulation, which I think must be part of a comprehensive modern archaeological phylogenetics.
The image from below shows four different tree models, and how different speciation and extinction parameters can affect the tree for each:
The image below, from Wright (2019), summarises what MCMC is doing for us in a phylogenetic analysis (I guess you might already be familiar with MCMC from reading McElreath's book?)
Hi Ben - lovely, those are some seriously useful references, too. I agree that Brownian motion is the most common model chosen in cultural phylo studies and, as far as I understand, the one making the least assumptions about process constriants. Regarding the tree models, it'd be amazing if we could implement the fossilized birth-death (FBD) process but data requirements would be steep. Perhaps the Nicolas data would be good as a platform for exploring these, or else perhaps Gopher's (1994) sample of Neolithic points from the Levant. Any idea how that tree models deals with missing parameter information?
There's so much fascinating work in this repo, really amazing stuff! It got me thinking about options for Bayesian inference with these data, and here's my attempt at a workflow using the RevBayes program (it's very similar to R!). I've added some comments throughout so hopefully you can make it work on your machine. Most of it is directly from Parins-Fukuchi 2018 (I know you mentioned this paper already, perhaps you've already tried it also) and the RevBayes code in her repository.
This is the output from densiTree: