Closed ElizabethRobbins closed 1 year ago
Dear @ElizabethRobbins,
This is because BGM works with codon data, so it will consider both synonymous and non-synonymous substitutions when determining patterns of co-evolution. In this particular case (9 and 398), you have co-evolution via synonymous substitutions. If you wish to look ONLY at amino-acid substitutions, you can run translated data (i.e. protein sequences) through BGM:
$hyphy conv Universal "Keep Deletions" seqs.txt seqs_prot.txt
$hyphy bgm --type amino-acid --alignment seqs_prot.txt --tree tree.txt
...
### Inferring a BGM on 128 nodes [sites]
| Site 1 | Site 2 |P [Site 1 <-> Site 2]|Subs (1,2,shared)|
|:----------:|:----------:|:-------------------:|:---------------:|
| 6 | 39 | 0.550 | 1, 2, 1 |
| 24 | 258 | 0.524 | 5, 3, 2 |
| 26 | 530 | 0.549 | 2, 5, 2 |
| 26 | 548 | 0.513 | 2, 5, 2 |
| 42 | 449 | 0.613 | 1, 1, 1 |
| 46 | 89 | 0.616 | 1, 2, 1 |
| 82 | 336 | 0.649 | 7, 5, 3 |
| 82 | 530 | 0.655 | 7, 5, 3 |
| 87 | 93 | 0.541 | 8, 5, 3 |
| 87 | 367 | 0.513 | 8, 2, 2 |
| 87 | 486 | 0.823 | 8, 4, 3 |
| 90 | 548 | 0.812 | 7, 5, 3 |
| 91 | 349 | 0.543 | 8, 4, 3 |
| 93 | 349 | 0.948 | 5, 4, 3 |
| 93 | 507 | 0.953 | 5, 12, 4 |
| 95 | 439 | 0.754 | 4, 3, 2 |
| 138 | 222 | 0.654 | 6, 2, 2 |
| 138 | 486 | 0.943 | 6, 4, 3 |
| 221 | 460 | 0.961 | 2, 3, 2 |
| 247 | 505 | 0.685 | 7, 8, 3 |
| 266 | 484 | 0.742 | 1, 1, 1 |
| 316 | 439 | 0.584 | 4, 3, 2 |
| 337 | 359 | 0.941 | 3, 2, 2 |
| 435 | 445 | 0.995 | 5, 7, 4 |
| 439 | 475 | 0.640 | 3, 16, 3 |
| 439 | 530 | 0.553 | 3, 5, 2 |
| 439 | 541 | 0.818 | 3, 3, 2 |
| 445 | 464 | 0.960 | 7, 3, 3 |
| 445 | 488 | 0.859 | 7, 7, 4 |
| 459 | 546 | 0.851 | 17, 14, 6 |
| 461 | 487 | 0.695 | 7, 10, 4 |
| 461 | 496 | 0.508 | 7, 5, 3 |
| 461 | 497 | 0.928 | 7, 4, 3 |
| 462 | 513 | 0.585 | 2, 1, 1 |
| 471 | 503 | 0.795 | 1, 1, 1 |
| 475 | 546 | 0.979 | 16, 14, 7 |
| 477 | 537 | 0.787 | 4, 7, 3 |
| 482 | 535 | 0.959 | 8, 8, 4 |
| 483 | 530 | 0.670 | 8, 5, 3 |
| 486 | 530 | 0.505 | 4, 5, 2 |
| 486 | 535 | 0.850 | 4, 8, 3 |
| 487 | 488 | 0.981 | 10, 7, 5 |
| 487 | 537 | 0.722 | 10, 7, 4 |
| 488 | 549 | 0.542 | 7, 6, 3 |
| 489 | 528 | 0.662 | 1, 2, 1 |
| 490 | 531 | 0.825 | 7, 12, 4 |
| 496 | 537 | 0.513 | 5, 7, 3 |
| 498 | 546 | 0.993 | 6, 14, 5 |
| 499 | 513 | 0.598 | 2, 1, 1 |
| 531 | 548 | 0.940 | 12, 5, 4 |
| 538 | 541 | 0.757 | 13, 3, 3 |
| 541 | 547 | 0.638 | 3, 4, 2 |
----
## BGM analysis summary on 128 sites each with at least 1 substitutions. Evidence for conditional dependence was reported at posterior probability of 0.5
* 52 pairs of conditionally dependent sites found
Best, Sergei
Stale issue message
Hello,
I submitted a codon sequence alignment and phylogenetic tree to the HyPhy (2.5.40) BGM command line tool and it indicated 33 pairs of sites are co-evolving at a posterior probability of 0.9. However, some of these sites identified as co-evolving are invariant at the amino acid level. I am aware of the presence of codon islands and this shouldn't be causing the issue here. Is there another reason that these invariant sites are identified as coevolving?
For a specific example of this, I have attached the sequence file, tree file and BGM json output file. BGM indicates that site 9 and 398 are co-evolving. Both of these sites code for a F and are encoded exclusively by either TTT or TTC. So I am unsure why these have been identified as co-evolving. Any help would be much appreciated.
seqs.txt tree.txt BGM_json.txt