veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
211 stars 69 forks source link

OpenDataPanel documentation, etc #248

Closed bernardo1963 closed 9 years ago

bernardo1963 commented 9 years ago

I am a new user of HyPhy, so my doubts may be a bit naïve. Years ago a collaborator did an analysis with HyPhy in a Drosophila gene that transposed from an autosome to the Y chromosome , and then made 18 copies inside the Y (Krsticevic et al Genetics 2010). Some copies have premature stop codons , and hence most likely were pseudogenes, but other have intact ORFs. The idea was to found if the intact copies are functional, by comparing their omega to the omega of the pseugenes. It worked very well , and we fund good evidences that the potentially functional copies indeed were evolving under purifying selection. I would like to repeat the analysis because we recently found that some of the 18 genes were experimental artifacts. I would like to use HBL, in order to precisely document the procedures. I have an old batch file that documents what was done in 2010, and while adapting it for the updated dataset, I came across a HBL command (OpenDataPanel) that was not documented in the HyPhy web site (and a Google search was also innefective). Here is my current batch file:

DataSet mst_2014 = ReadDataFile ("mst_2014.fasta"); /* below removes premature stop codon in positions 229-231 */ DataSetFilter mst_2014_part = CreateFilter (mst_2014,3,"0-228,232-684","","TAA,TAG,TGA");

Tree mst_2014_tree=((((((((Mst77Y_12,Mst77Y_6_Psi)Node8pf,((Mst77Y_3_Psi,Mst77Y_13)Node12pf,Mst77Y_1)Node11pf)Node7pf,Mst77Y_7)Node6pf,Mst77Y_4)Node5pf,Mst77Y_18_Psi)Node4pf,((Mst77Y_10_Psi,Mst77Y_17_Psi)Node20nf,Mst77Y_16_Psi)Node19nf)Node2pf,Mst77F)A,(sechellia,simulans)B,(erecta,yakuba)C);

OpenDataPanel(mst_2014,"","0,0,0,0,2,0,10","MG94xHKY85_3x4,5,0,14443523,mst_2014_tree",theLikFun);

My doubts: 1) What is the meaning of the arguments of OpenDataPanel? Will I need to change them if I remove or add sequences? Alternatively, can I replace it by the UseModel command, perhaps reading some internal HyPhy file that contains the model specification (e.g., file MG94_HKY85x3_4.mdl )?

2) In the next lines of the batch file I specified the null and alternative hypothesis, and then made the LRT test as follows. If possible, may you tell me if there is something wrong here? The script seems to be running Ok.

hypothesis = "All Y copies. dN + dS"; ClearConstraints(mst_2014_tree); mst_2014_tree.Mst77Y_10_Psi.nonSynRate := mst_2014_tree.Mst77Y_1.nonSynRate; mst_2014_tree.Mst77Y_10_Psi.synRate := mst_2014_tree.Mst77Y_1.synRate; [other constraints] … Optimize(dN_dS,theLikFun);

hypothesis = "All Y copies. dN = dS"; ClearConstraints(mst_2014_tree); mst_2014_tree.Mst77Y_1.synRate := mst_2014_tree.Mst77Y_1.nonSynRate; mst_2014_tree.Mst77Y_10_Psi.nonSynRate := mst_2014_tree.Mst77Y_1.nonSynRate; mst_2014_tree.Mst77Y_10_Psi.synRate := mst_2014_tree.Mst77Y_1.synRate; [other constraints] … Optimize(dNeqdS,theLikFun);

… fprintf (stdout,"\n\n\nTest 1 H0: dN = dS H1: dN + dS \n"); null_hypothesis = dNeqdS; alternative_hypothesis = dN_dS; LRT = -2*( null_hypothesis[1][0] - alternative_hypothesis[1][0]); df = alternative_hypothesis[1][1] - null_hypothesis[1][1]; Pvalue=1-CChi2(LRT, df); fprintf(stdout,"The statistic ",LRT, " (", df, ") has P-value ", Pvalue, "\n\n");

3) The following suggestion may be out of the point (or may already had been implemented), but in any case.. It would facilitate the learning of HBL if there is some log file that saves the HBL "translation" of the commands issued in the graphical interface. The statistical software SYSTAT has this, and it is really handy.

Thanks, Bernardo

spond commented 9 years ago

Dear Berardo,

We have recently published an analysis - RELAX which may be well suited to testing whether or not there is evidence of relaxed selection on the pseudogenes relative to ORFs. There is a nice graphical implementation on test.datamonkey.org, and in HyPhy itself (see the paper for details).

To answer your questions: you are modifying a file saved from the HyPhy GUI. We do not recommend developing reusable analyses by editing those files. We are also hard at work at a new version of HyPhy where the entire GUI will be replaced with a new (web-based) system, so most of these features will be deprecated.

If RELAX doesn't work for you, we can revisit implementing a different analysis based on your template.

Sergei