veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

BGM results showing constant sites are co-evolving #1527

Closed ElizabethRobbins closed 1 year ago

ElizabethRobbins commented 1 year ago

Hello,

I submitted a codon sequence alignment and phylogenetic tree to the HyPhy (2.5.40) BGM command line tool and it indicated 33 pairs of sites are co-evolving at a posterior probability of 0.9. However, some of these sites identified as co-evolving are invariant at the amino acid level. I am aware of the presence of codon islands and this shouldn't be causing the issue here. Is there another reason that these invariant sites are identified as coevolving?

For a specific example of this, I have attached the sequence file, tree file and BGM json output file. BGM indicates that site 9 and 398 are co-evolving. Both of these sites code for a F and are encoded exclusively by either TTT or TTC. So I am unsure why these have been identified as co-evolving. Any help would be much appreciated.

seqs.txt tree.txt BGM_json.txt

spond commented 1 year ago

Dear @ElizabethRobbins,

This is because BGM works with codon data, so it will consider both synonymous and non-synonymous substitutions when determining patterns of co-evolution. In this particular case (9 and 398), you have co-evolution via synonymous substitutions. If you wish to look ONLY at amino-acid substitutions, you can run translated data (i.e. protein sequences) through BGM:

$hyphy conv Universal "Keep Deletions" seqs.txt seqs_prot.txt

$hyphy bgm --type amino-acid --alignment seqs_prot.txt --tree tree.txt 
...

### Inferring a BGM on 128 nodes [sites]

|   Site 1   |   Site 2   |P [Site 1 <-> Site 2]|Subs (1,2,shared)|
|:----------:|:----------:|:-------------------:|:---------------:|
|       6    |      39    |        0.550        |     1, 2, 1     |
|      24    |     258    |        0.524        |     5, 3, 2     |
|      26    |     530    |        0.549        |     2, 5, 2     |
|      26    |     548    |        0.513        |     2, 5, 2     |
|      42    |     449    |        0.613        |     1, 1, 1     |
|      46    |      89    |        0.616        |     1, 2, 1     |
|      82    |     336    |        0.649        |     7, 5, 3     |
|      82    |     530    |        0.655        |     7, 5, 3     |
|      87    |      93    |        0.541        |     8, 5, 3     |
|      87    |     367    |        0.513        |     8, 2, 2     |
|      87    |     486    |        0.823        |     8, 4, 3     |
|      90    |     548    |        0.812        |     7, 5, 3     |
|      91    |     349    |        0.543        |     8, 4, 3     |
|      93    |     349    |        0.948        |     5, 4, 3     |
|      93    |     507    |        0.953        |    5, 12, 4     |
|      95    |     439    |        0.754        |     4, 3, 2     |
|     138    |     222    |        0.654        |     6, 2, 2     |
|     138    |     486    |        0.943        |     6, 4, 3     |
|     221    |     460    |        0.961        |     2, 3, 2     |
|     247    |     505    |        0.685        |     7, 8, 3     |
|     266    |     484    |        0.742        |     1, 1, 1     |
|     316    |     439    |        0.584        |     4, 3, 2     |
|     337    |     359    |        0.941        |     3, 2, 2     |
|     435    |     445    |        0.995        |     5, 7, 4     |
|     439    |     475    |        0.640        |    3, 16, 3     |
|     439    |     530    |        0.553        |     3, 5, 2     |
|     439    |     541    |        0.818        |     3, 3, 2     |
|     445    |     464    |        0.960        |     7, 3, 3     |
|     445    |     488    |        0.859        |     7, 7, 4     |
|     459    |     546    |        0.851        |    17, 14, 6    |
|     461    |     487    |        0.695        |    7, 10, 4     |
|     461    |     496    |        0.508        |     7, 5, 3     |
|     461    |     497    |        0.928        |     7, 4, 3     |
|     462    |     513    |        0.585        |     2, 1, 1     |
|     471    |     503    |        0.795        |     1, 1, 1     |
|     475    |     546    |        0.979        |    16, 14, 7    |
|     477    |     537    |        0.787        |     4, 7, 3     |
|     482    |     535    |        0.959        |     8, 8, 4     |
|     483    |     530    |        0.670        |     8, 5, 3     |
|     486    |     530    |        0.505        |     4, 5, 2     |
|     486    |     535    |        0.850        |     4, 8, 3     |
|     487    |     488    |        0.981        |    10, 7, 5     |
|     487    |     537    |        0.722        |    10, 7, 4     |
|     488    |     549    |        0.542        |     7, 6, 3     |
|     489    |     528    |        0.662        |     1, 2, 1     |
|     490    |     531    |        0.825        |    7, 12, 4     |
|     496    |     537    |        0.513        |     5, 7, 3     |
|     498    |     546    |        0.993        |    6, 14, 5     |
|     499    |     513    |        0.598        |     2, 1, 1     |
|     531    |     548    |        0.940        |    12, 5, 4     |
|     538    |     541    |        0.757        |    13, 3, 3     |
|     541    |     547    |        0.638        |     3, 4, 2     |
----
## BGM analysis summary on 128 sites each with at least 1 substitutions. Evidence for conditional dependence was reported at posterior probability of 0.5
* 52 pairs  of conditionally dependent sites found

Best, Sergei

github-actions[bot] commented 1 year ago

Stale issue message