Closed ElizabethRobbins closed 1 year ago
Dear @ElizabethRobbins,
This question comes up periodically. A while ago we even wrote a little passage on it in a book chapter (see the box on pg 14 of http://hyphy.org/resources/hyphybook2007.pdf)
HyPhy does not estimate dS
and dN
separately but rather estimates one of two things
The two are related to dS
and dN
in the way described in the linked book chapter.
Now, if you want a gene-wide dS
and dN
estimate under a simple codon model (ω does not vary from site to site or branch-to-branch), then your best bet is FitMG94.bf
, which can be found here.
hyphy FitMG94.bf --alignment CD2.nex
....
### **Non-synonymous tree**
((((PIG:0.1428267548771765,COW:0.1837435716721197)Node5:0.07540645230241928,HORSE:0.1565335253856734,CAT:0.2029527757027807)Node4:0.04766707340051019,((RHMONKEY:0.002756088669461498,BABOON:0.001313839023437547)Node11:0.01921779052523892,(HUMAN:0,CHIMP:0.00135520410116047)Node14:0.01326705738724444)Node10:0.08128279147131144)Node3:0.2104717704029049,RAT:0.0496776467928151,MOUSE:0.08906244413250149)
**Combined tree**
((((PIG:0.1925706847447909,COW:0.2477380757182482)Node5:0.1016691317151305,HORSE:0.211051271135841,CAT:0.2736374919499469)Node4:0.06426863771014162,((RHMONKEY:0.003715983624720521,BABOON:0.001771424972900691)Node11:0.02591099324429585,(HUMAN:0,CHIMP:0.00182719674583269)Node14:0.01788772928298436)Node10:0.1095920916571899)Node3:0.2837752141101997,RAT:0.06697945680855745,MOUSE:0.1200812541487013)
If you look inside the JSON
file that this analysis generates, each branch will have a pair of dS
and dN
values reported
If you wanted to get a dS
and dN
for the entire gene, you would simply sum up the corresponding attributes over the entire tree. You can do this many ways (e.g. read the json
in using Python or R, etc), but here's a simple one liner using the excellent jq
utility.
...compute dS....
$jq '.["branch attributes"]["0"] | map(.dS) | add ' CD2.nex.FITTER.json
2.1082808416221726
...compute dN....
$jq '.["branch attributes"]["0"] | map(.dN) | add ' CD2.nex.FITTER.json
1.619274321250762
Note that dN/dS
will not be the same as the estimated ω ratio. For this specific case, the analysis reports
non-synonymous/synonymous rate ratio = 0.9959
but if you take the dN/dS
ratio as computed above, you will get 1.619/2.108 ~ 0.768
HTH, Sergei
Hi @spond,
Thank you for your reply, it was very helpful. I do have a couple of follow up question.
I have run the FitMG94.bf script on an alignment and provided a phylogenetic tree (attached below). Then, as you pointed out, to get the dN and dS of the whole gene I summed the corresponding attributes. However, I noticed that the number of branches it was summing was not what I expected. To my understanding there should be 2n-2 branches in a fully bifurcating tree, where n is the number of leaves. My phylogenetic tree has 773 species and thus there should be 1544 branches. However, only 1258 are being summed. Do you know why this is the case?
Files phylogenetic_tree.txt alignment.txt output.txt
Thank you for your help. All the best, Lizzie
Dear @ElizabethRobbins,
Because HyPhy
deals (mostly) with time-reversible models, the trees it considers are UNROOTED, which means that there should be 2N-3
branches in such trees.
Further, to increase computational efficiency, HyPhy
will first fit a nucleotide model to estimate initial branch lengths. At this stage, the default behavior is to remove all zero-length branches (i.e. create multi-furcations), because this does not change the likelihood or model estimates (there is nothing to be learned from zero-length branches since nothing happens on them).
You can determine whether or not this happened by looking for the following type of output ...
### Obtaining branch lengths and nucleotide substitution biases under the nucleotide GTR model
>kill-zero-lengths –> Yes
### Deleted 276 zero-length internal branches: `Node1001, Node1006, Node1020, Node1023, Node1024, Node1025, Node1026, Node1027, Node1028, Node1029, Node1030, Node1031, Node1033, Node1036, Node1037, Node1038, Node1041, Node1042, Node1044, Node1049, Node1052, Node1053, Node1054, Node1055, Node1058, Node1059, Node1060, Node1063, Node1064, Node1071, Node1072, Node1075, Node1079, Node1080, Node1086, Node1089, Node1090, Node1092, Node1096, Node1101, Node1103, Node1106, Node1111, Node1117, Node1141, Node1144, Node1146, Node1149, Node1152, Node116, Node1169, Node1170, Node1177, Node1178, Node1179, Node1180, Node1181, Node1183, Node1184, Node1185, Node1186, Node1192, Node1197, Node1201, Node1203, Node1209, Node1213, Node1215, Node1222, Node1223, Node1224, Node1225, Node1226, Node1233, Node1236, Node1242, Node1245, Node1247, Node1248, Node1249, Node1257, Node1259, Node1270, Node1281, Node1286, Node1291, Node1297, Node1309, Node1318, Node1327, Node1329, Node1336, Node1346, Node1348, Node1355, Node1357, Node1361, Node1366, Node1369, Node1383, Node1387, Node1388, Node1399, Node140, Node1419, Node1422, Node1433, Node1436, Node1437, Node1438, Node1439, Node1440, Node1441, Node1442, Node1443, Node1448, Node1454, Node1459, Node1460, Node1471, Node1481, Node1489, Node15, Node1511, Node1514, Node152, Node1521, Node1527, Node1529, Node153, Node1531, Node1534, Node1536, Node155, Node158, Node160, Node170, Node171, Node18, Node182, Node2, Node206, Node208, Node212, Node213, Node219, Node235, Node251, Node260, Node280, Node284, Node286, Node299, Node3, Node301, Node31, Node312, Node313, Node320, Node326, Node327, Node335, Node339, Node34, Node345, Node346, Node347, Node348, Node349, Node355, Node36, Node361, Node364, Node365, Node384, Node385, Node387, Node392, Node398, Node4, Node417, Node418, Node419, Node424, Node439, Node443, Node45, Node451, Node454, Node455, Node456, Node457, Node46, Node470, Node471, Node492, Node502, Node503, Node505, Node538, Node550, Node555, Node580, Node595, Node605, Node630, Node635, Node637, Node656, Node657, Node66, Node663, Node665, Node666, Node682, Node683, Node684, Node686, Node687, Node688, Node689, Node695, Node696, Node697, Node701, Node703, Node707, Node710, Node713, Node716, Node722, Node723, Node725, Node726, Node730, Node737, Node738, Node739, Node742, Node743, Node751, Node752, Node757, Node759, Node767, Node768, Node771, Node772, Node786, Node787, Node79, Node8, Node838, Node842, Node843, Node844, Node845, Node846, Node847, Node849, Node861, Node870, Node88, Node89, Node892, Node90, Node922, Node942, Node946, Node948, Node95, Node956, Node975, Node98, Node981, Node983`
Therefore, the total number of branches should be 2*773-3-276 = 1267
I am not sure where 1258
comes from because I 1267
branch records in the output file:
jq '.["branch attributes"]["0"] | length' /Users/sergei/Downloads/alignment.txt.FITTER.json
1267
You can override the default behavior for zero branch lengths using the command line argument
(keep zero-length branches and re-estimate them under the codon model)
--kill-zero-lengths No
(keep zero-length branches and fix them at 0)
--kill-zero-lengths Constrain
When you do this, the analyses will be slower (somewhat) but the result should be the same, and the number of branches -- higher - with the --kill-zero-lengths No
setting:
jq '.["branch attributes"]["0"] | length' /Users/sergei/Downloads/alignment.txt.FITTER.json
1543
Best, Sergei
Stale issue message
Hello,
Can HYPHY calculate gene wide dN and dS values separately?
I have used FUBAR previously, which gives site specific dN and dS values in the output. Would it be appropriate to average all site specific dN values to get a gene wide dN value? And similar for dS?
Thank you in advance, Lizzie