Phylogenetic Reconstruction

noahaus commented 4 years ago

June 9th Opening this issue to try and figure out the best way to build the phylogenetic tree for my assembled isolates.

in Patterns and Processes of Mycobacterium bovis Evolution Revealed by Phylogenomic Analyses (patane et al., 2017) the researchers did the following to create a phylogenetic tree:

Phylogenetic analysis was done by maximum likelihood (ML) within IQTree v1.3.12 (Nguyen et al. 2015), with support for each phylogeny calculated from 1,000 UFBoot pseudoreplicates (Minh et al. 2013). An important thing to consider when estimating character-based phylogenies (such as ML-based) is the amount of variation among data partitions, otherwise the resulting topology may be biased (Brown and Lemmon 2007). To test for best partitioning scheme for core-coding genes, we entered the three codon positions as data blocks (hence each block encompasses one-third of the alignment), and then estimated the best partitioning scheme among them and respective model for each partition under IQTree, using BIC as the criterion for data fit. To test for variability of rates along the genome, besides using the typical gamma distribution (+G) and a probability of invariable sites (+I) for each proposed partition, we also tested models with mixture distributions for each site, from two to five such mixtures per model (+MM), which in many cases have better fit (Venditti et al. 2008).

Further phylogenetic analyses aimed to test for (and possibly alleviate) systematic biases. As a proxy for phylogenetic reliability of each tree, the retention index (RI; Farris 1989) was used to assess the fit of the character to “sampling locality” (continents used as tip states) across the 38-genome data set, following previous conclusions showing that regionally M. bovis samples tend to cluster together (e.g., Hang’ombe et al. 2012; Allen et al. 2013; Hauer et al. 2015). We ascribed one among five states to each tip according to its respective continental region (South America, North America, Europe, Asia, and Africa). Subsequently, RI was calculated for each of the trees, with higher values indicating better agreement of that tree with geographic location. PAUP v4b10 (Swofford 2003) was used to calculate RI across the different topologies, using the command “describetrees”.

1) How do I do partitioning schemes that breakup alignments by codon position? 2) From the best partition, do I even need to calculate the Retention Index?

noahaus commented 4 years ago

June 10th

I've given it some thought and I think I will proceed with just using the core gene alignment created by Roary for the Homologous Recombination (HR) and Positive Selection Analysis.

Now I need to create a tree from this alignment in order to use for these two alignments. We can go super simple with no partition, and then also a partition. This will take some extra thinking on my end.

In the meantime, I'll create a quick tree in order to buttress these analyses and keep moving, and then we can judge results and what needs to be done to fix it later.

Jome0169 commented 4 years ago

Hey dude

noahaus commented 4 years ago

So I decided for now just to use recombination correction software gubbins and just proceed with the final ML tree that is provided. Till further notice, this is the phylogeny I will be using.

salvadorlab / Bovis-PangenomeOfReservoirs

Phylogenetic Reconstruction #3