sriramlab / OrientAGraph

GNU General Public License v3.0
11 stars 1 forks source link

How to interpret and visualize the results? #3

Closed biozzq closed 2 years ago

biozzq commented 3 years ago

Dear @ekmolloy

Thank you for developing this tool. After an initial reading of the paper about this tool, I have had a test. I found that the result has two more files (migration_one.llik, migration_one.mltree when the used command is orientagraph (version: 1.1) -i treemix.input.gz -o migration_one -root outgroup -bootstrap -m 1 -mlno 2) compared to the results generated by treemix. I would like to know if the topology (including migration events) could be visualized by using R scripts deposited in treemix?

Also, how to determine the best topology after running orientagraph with parameter -m from 0 to 10? Could I I just rely on ln(likelihood) recorded in the *llik file?

More, how to set parameter -mlno properly, I couldn't fully understand the meaning of this parameter.

Thank you very much.

Best wishes, Zheng zhuqing

ekmolloy commented 2 years ago

Thanks for your message!

Yes, the output files from OrientAGraph can be visualized using the same R scripts distributed with TreeMix. The file with the extension .mltree stores the newick string for the starting tree that TreeMix (and thus OrientAGraph) computes using randomized taxon addition (the newick string is also written to standard out by TreeMix/OrientAGraph). TreeMix and OrientAGraph both output a file with the extension .llik; this file includes the likelihood scores of the graphs (these scores are also written to standard output by TreeMix/OrientAGraph).

OrientAGraph differs from TreeMix by extending the search for an ML network. To run OrientAGraph, you should use the same TreeMix command that you were using before but include "-allmigs" and "-mlno" options at the end. The "-allmigs" option expands the search for adding a migration edge to the current network (it tests adding a migration edge between ALL pairs of tree edges and selects to one to maximize likelihood). The "-mlno" options expands the search for a network topology (it tests each network orientation and selects one to maximize likelihood).

It's important to look at the TreeMix manual (https://bitbucket.org/nygcresearch/treemix/downloads/) for setting the other parameters. For example, "-root nameofoutgroup" tells TreeMix (and thus OrientAGraph) to root the starting tree at an outgroup population named "nameofoutgroup". I have found in practice that this option has a big impact on the results. There is also information on bootstrapping and accounting for linkage disequilibrium.

If you are increasing the number of migration edges with the "-m" option, you can look at the likelihood score. I also recommend visualizing the data, especially the residuals, using the R code distributed with TreeMix. This will show you how well the model tree explains the data. We plot the residuals in Figure S3 in the Supplementary Materials (https://doi.org/10.1093/bioinformatics/btab267).

Hope this information is helpful, and let us know if you have other questions!