veg / hyphy-analyses

HyPhy standalone analyses
MIT License
37 stars 17 forks source link

Output of FitMG94 #17

Open mlosilla opened 3 years ago

mlosilla commented 3 years ago

Hi,

I am trying to understand the output of the FitMG94 model with per branch calculation (--type local).

For example, the output for one node is:

"Node20":{ "Confidence Intervals":{ "LB":0.05288894421311353, "MLE":0.1201576050549771, "UB":0.2263100014412389 }, "Nucleotide GTR":0.04948062317408397, "Standard MG94":0.05264881492703954, "dN":0.01794621524493222, "dS":0.1586217234027807, "nonsynonymous":0.01351914523388944, "original name":"Node20", "synonymous":0.03912966969315013

Is the following correct:

  1. MLE is the estimation of dN/dS = w, and it has a confidence interval with lower bound (LB) and upper bound (UB) limits, estimated by "profile likelihood" (I found this phrase in another post)

  2. "synonymous" and "nonsynonymous" are the values used in the synonymous and non-synonymous trees. These are the number of synonymous and non-synonymous substitutions per codon, and also the branch lengths. Could the total branch length be computed as "synonymous" + "nonsynonymous"?

  3. dN and dS are the number of [non]-synonymous substitutions divided by the number of codons that display [non]-synonymous substitutions in the alignment ???

  4. Is w (MLE) calculated (or very closely approximated) by dS/dN?

Thanks Mau

spond commented 3 years ago

Dear @mlosilla,

  1. Correct. ω is estimated directly (i.e. not dS and dN separately; the ratio is estimated as a model parameter)
  2. Yes.
  3. No -- as "synonymous subs" / expected synonymous sites (same for non-syn). More complete details are given on page 14 of http://www.hyphy.org/resources/hyphybook2007.pdf
  4. Approximated; dS/dN is not quite the same as &omega. For your example,. dN/dS = 0.113138445730805, and a direct estimate of ω = 0.1201576050549771. Close, but not the same. Spencer Muse had a really good paper on it close to 25 years ago. Sadly it is not well known. https://academic.oup.com/mbe/article/13/1/105/1055486

@SVMuse reads these, boards once in a while, so maybe he can chime in.

Best, Sergei

mlosilla commented 3 years ago

Hi Sergei,

Thank you for your reply and links, and it is much clearer now. A couple of follow-ups:

1) My goal with these data is to make a figure of my phylogeny with branch lengths: a) proportional to either "non-synonymous" or "non-synonymous" + "synonymous", I haven't decided which, and b) color-coded with a heatmap of the dN/dS ratios (w).

For 1b) the correct value would be the MLE right?

2) some MLE estimates are very high, probably due to a lack or almost lack of synonymous substitutions. How are those best interpreted?

3) more of a theoretical question: How does the taxonomic breadth of the phylogeny influence the w estimates? Does the Inclusion of more distantly related clades usually tends to affect dN and dS differently?

Thanks Mau