veg / hyphy-site

HyPhy Markdown Site
MIT License
0 stars 8 forks source link

Add bgm to methods descriptions #58

Closed rdvelazquez closed 5 years ago

rdvelazquez commented 5 years ago

@artpoon do you have a write up you would want to go in the methods description section of the hyphy site for bgm? What ever you want to do is fine by me: submit a PR, send me some text, point me in the right direction, etc. Thanks!

ArtPoon commented 5 years ago

I can prepare something fairly easily - can you point me to some examples?

rdvelazquez commented 5 years ago

Sure. It will just be another section here: http://www.hyphy.org/methods/selection-methods/

ArtPoon commented 5 years ago

@rdvelazquez if you don't mind I'll just write some Markdown right here:

BGM

The Bayesian Graphical Model (BGM) method is a tool for detecting coevolutionary interactions between amino acid positions in a protein. This method is similar to the "correlated substitutions" method described by Shindyalov et al. 1994, in which amino acid substitution events are mapped to branches in the phylogenetic tree. BGM uses a method similar to SLAC, where amino acid substitution events are mapped to the tree from the ancestral reconstruction under joint maximum likelihood for a given a model of codon substitution rates.

After amino acid substitutions have been mapped, the user is required to specify a filtering criterion to reduce the number of codon sites in the alignment to be analyzed. This is an important step because the number of graphical models (networks) increases faster than exponentially with the number of variables. You do not want to have many more codon sites than there are sequences (observations) in the alignment. Furthermore, since the BGM analysis is essentially driven by a series of tests on 2x2 contingency tables (comprising the presence/absence of substitutions on branches), you should generally avoid including codon sites where only a single amino acid substitution was mapped to the tree.

A Bayesian graphical model (Bayesian network) is a probabilistic framework from the field of artificial intelligence that enables a machine to generate a representation of a complex system that is made up of an unknown number of conditional dependencies (statistical associations) among a large number of variables. These dependencies comprise the network structure. This approach is useful because these associations are evaluated in the full context of the joint probability distribution; there is no need to filter significant associations to adjust for multiple comparisons, for instance.

BGM uses a Markov chain Monte Carlo method to generate a random sample of network structures from the posterior distribution. Because the space of all possible network structures is too extensive, we use an MCMC method described by Friedman and Koller, which collapses this enormous space by grouping structures into subsets defined by a node hierarchy. This results in a more compact space where the posterior distribution has nicer convergence properties.

If you use BGM in your analysis, please cite the following: Poon, AFY et al. "An Evolutionary-Network Model Reveals Stratified Interactions in the V3 Loop of the HIV-1 Envelope." PLOS Comput Biol 3, e231 (2007).

rdvelazquez commented 5 years ago

Perfect. Thank you!!

ArtPoon commented 5 years ago

Can I add this reference?

Avino M and Poon AFY. "Detecting Amino Acid Coevolution with Bayesian Graphical Models." Methods Mol Biol 1851: 105-122 (2019).

This book chapter provides extensive details about how to run a BGM analysis in HyPhy.