veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
214 stars 69 forks source link

GARD crashes when run on 27 SARS-CoV-2 sequences #1731

Open corneliusroemer opened 2 months ago

corneliusroemer commented 2 months ago

I've been playing around with GARD and ran it on 10 and 20 randomly sampled Pango sequence founders.

When I tried it out on 27 sequences, however, I got an error:

 hyphy gard --alignment sample2.fasta.txt               

Analysis Description
--------------------
GARD : Genetic Algorithms for Recombination Detection. Implements a
heuristic approach to screening alignments of sequences for
recombination, by using the CHC genetic algorithm to search for
phylogenetic incongruence among different partitions of the data. The
number of partitions is determined using a step-up procedure, while the
placement of breakpoints is searched for with the GA. The best fitting
model (based on c-AIC) is returned; and additional post-hoc tests run to
distinguish topological incongruence from rate-variation. v0.2 adds and
spooling results to JSON after each breakpoint search conclusion

- __Requirements__: A sequence alignment.

- __Citation__: **Automated Phylogenetic Detection of Recombination Using a Genetic
Algorithm**, _Mol Biol Evol 23(10), 1891–1901

- __Written by__: Sergei L Kosakovsky Pond

- __Contact Information__: spond@temple.edu

- __Analysis Version__: 0.2

type: nucleotide
rv: None
>Maximum number of breakpoints to consider (permissible range = [1,100000], default value = 10000, integer): max-breakpoints: 10000
mode: Normal
>Loaded a nucleotide multiple sequence alignment with **27** sequences, **29903** sites (291 of which are variable) from `/Users/corneliusromer/Downloads/tmp/gard/sample.fasta`
>Minimum size of a partition is set to be 51 sites

### Fitting the baseline (single-partition; no breakpoints) model
* Log(L) = -43773.22, AIC-c = 87664.67 (59 estimated parameters)

### Performing an exhaustive single breakpoint analysis
[GARD] Breakpoint        100 of 290. Best cAIC =   87426.5400 [delta =     238.1340] with breakpoint at site      15719Error:
Internal error 
Internal error in ComputeBranchCache (branch gard.tree.part_1.B_1_499, eval #320818 ) reversible model cached likelihood =   -20659.72185688746, directly computed likelihood =   -20659.72152998298. This is most likely because a non-reversible model was incorrectly auto-detected (or specified by the model file in environment variables; for smaller errors, this could be due to numerical instability of calculations for larger alignments). To treat numerical errors as warnings, please specify "ENV=TOLERATE_NUMERICAL_ERRORS=1;" as the command line argument. This often resolves the issue, which is indicative of numerical instability.

Function call stack
1 :  [namespace = fnwxwutu] Optimize(mles, ^lf_id);
-------
2 :  [namespace = MeSxYuFx] res=estimators.FitExistingLF(&likelihoodFunction,modelObjects);
-------
3 :  [namespace = ilgPuOUI] fit=gard.fitPartitionedModel(breakPoints,model,initialValues,null,FALSE);
-------
4 :  [namespace = mpi.pBGkptzZ] Call(result_callback,0,Eval(job+'('+Join(",",utility.Map(arguments,"_value_","utility.convertToArgumentString (_value_)"))+')'),arguments);
-------
5 :  [namespace = gard] mpi.QueueJob(queue,"gard.obtainModel_cAIC",{"0":{{siteIndex__}},"1":model,"2":baseLikelihoodInfo},"gard.storeSingleBreakPointModelResults");
-------
6 :  namespace 

Step 0.singleBreakPointBest_cAIC=^"math.Infinity";

Step 1.breakPointIndex=0;

Step 2.Branch under condition 'breakPointIndex<variableSites-1'
        to
                siteIndex=variableSiteMap[breakPointIndex];
        else
                mpi.QueueComplete(queue);

Step 3.siteIndex=variableSiteMap[breakPointIndex];

Step 4.Branch under condition 'singleBreakPointBest_cAIC<baseline_cAIC'
        to
                io.ReportProgressBar("GARD","Breakpoint "+Format(1+breakPointIndex,10,0)+" of "+(variableSites-1)+". Best cAIC = "+Format(singleBreakPointBest_cAIC,12,4)+" [delta = "+Format(baseline_cAIC-singleBreakPointBest_cAIC,12,4)+"] with breakpoint at site "+Format(singleBreakPointBestLocation,10,0));
        else
                io.ReportProgressBar("GARD","Breakpoint "+Format(1+breakPointIndex,10,0)+" of "+(variableSites-1)+". Best cAIC = "+Format(baseline_cAIC,12,4)+" with no breakpoints.");

Step 5.io.ReportProgressBar("GARD","Breakpoint "+Format(1+breakPointIndex,10,0)+" of "+(variableSites-1)+". Best cAIC = "+Format(singleBreakPointBest_cAIC,12,4)+" [delta = "+Format(baseline_cAIC-singleBreakPointBest_cAIC,12,4)+"] with breakpoint at site "+Format(singleBreakPointBestLocation,10,0));

Step 6.Go to step 8

Step 7.io.ReportProgressBar("GARD","Breakpoint "+Format(1+breakPointIndex,10,0)+" of "+(variableSites-1)+". Best cAIC = "+Format(baseline_cAIC,12,4)+" with no breakpoints.");

Step 8.Branch under condition 'gard.validatePartititon({{siteIndex}},minPartitionSize,numSites)'
        to
                mpi.QueueJob(queue,"gard.obtainModel_cAIC",{"0":{{siteIndex__}},"1":model,"2":baseLikelihoodInfo},"gard.storeSingleBreakPointModelResults");
        else
                breakPointIndex+=1;

Step 9.mpi.QueueJob(queue,"gard.obtainModel_cAIC",{"0":{{siteIndex__}},"1":model,"2":baseLikelihoodInfo},"gard.storeSingleBreakPointModelResults");

Step 10.breakPointIndex+=1;

Step 11.Go to step 2

Step 12.mpi.QueueComplete(queue);

Step 13.io.ClearProgressBar();

Step 14.io.ReportProgressMessageMD('GARD','single-breakpoint','Done with single breakpoint analysis.');

Step 15.io.ReportProgressMessageMD('GARD','single-breakpoint',("   Best single break point location: "+singleBreakPointBestLocation));

Step 16.io.ReportProgressMessageMD('GARD','single-breakpoint',("   c-AIC  = "+singleBreakPointBest_cAIC));;
-------

Check errors.log for execution error details.

File is this one: sample2.fasta.txt

errors.log:

Error:Internal error 
Internal error in ComputeBranchCache (branch gard.tree.part_1.B_1_499, eval #320818 ) reversible model cached likelihood =   -20659.72185688746, directly computed likelihood =   -20659.72152998298. This is most likely because a non-reversible model was incorrectly auto-detected (or specified by the model file in environment variables; for smaller errors, this could be due to numerical instability of calculations for larger alignments). To treat numerical errors as warnings, please specify "ENV=TOLERATE_NUMERICAL_ERRORS=1;" as the command line argument. This often resolves the issue, which is indicative of numerical instability.
Internal error 
Internal error in ComputeBranchCache (branch gard.tree.part_1.B_1_499, eval #320818 ) reversible model cached likelihood =   -20659.72185688746, directly computed likelihood =   -20659.72152998298. This is most likely because a non-reversible model was incorrectly auto-detected (or specified by the model file in environment variables; for smaller errors, this could be due to numerical instability of calculations for larger alignments). To treat numerical errors as warnings, please specify "ENV=TOLERATE_NUMERICAL_ERRORS=1;" as the command line argument. This often resolves the issue, which is indicative of numerical instability.
corneliusroemer commented 2 months ago

I've tried the following, the error was always the same. Did I do it wrong? It would be nice if the error message acknowledge that I had already passed the tolerate numerical errors setting, the fact it doesn't makes it seem like I maybe haven't passed it correctly?

export ENV=TOLERATE_NUMERICAL_ERRORS=1; hyphy gard --alignment sample.fasta
ENV=TOLERATE_NUMERICAL_ERRORS=1; hyphy gard --alignment sample.fasta
ENV=TOLERATE_NUMERICAL_ERRORS=1 hyphy gard --alignment sample.fasta
hyphy gard --alignment sample.fasta --env TOLERATE_NUMBERICAL_ERRORS=1
liamxg commented 2 months ago

Dear @corneliusroemer,

Please update the version of hyphy?

corneliusroemer commented 2 months ago

@liamxg I'm not sure I understand. I was using the latest version as far as I can tell. Can you explain what you meant?

liamxg commented 2 months ago

Hi, @corneliusroemer,

I also have error:


Analysis Description
--------------------
GARD : Genetic Algorithms for Recombination Detection. Implements a
heuristic approach to screening alignments of sequences for
recombination, by using the CHC genetic algorithm to search for
phylogenetic incongruence among different partitions of the data. The
number of partitions is determined using a step-up procedure, while the
placement of breakpoints is searched for with the GA. The best fitting
model (based on c-AIC) is returned; and additional post-hoc tests run to
distinguish topological incongruence from rate-variation. v0.2 adds and
spooling results to JSON after each breakpoint search conclusion

- __Requirements__: A sequence alignment.

- __Citation__: **Automated Phylogenetic Detection of Recombination Using a Genetic
Algorithm**, _Mol Biol Evol 23(10), 1891–1901

- __Written by__: Sergei L Kosakovsky Pond

- __Contact Information__: spond@temple.edu

- __Analysis Version__: 0.2

type: nucleotide
rv: None
>Maximum number of breakpoints to consider (permissible range = [1,100000], default value = 10000, integer): max-breakpoints: 10000
mode: Normal
>Loaded a nucleotide multiple sequence alignment with **90** sequences, **3119** sites (2849 of which are variable) from `90.fas`
>Minimum size of a partition is set to be 1377 sites

### Fitting the baseline (single-partition; no breakpoints) model
Error:
Internal error 
Internal error in ComputeBranchCache (branch gard.tree.part_0.Node1227, eval #3101 ) reversible model cached likelihood =   -231791.7999910961, directly computed likelihood =   -231813.4426094261. This is most likely because a non-reversible model was incorrectly auto-detected (or specified by the model file in environment variables; for smaller errors, this could be due to numerical instability of calculations for larger alignments).

Function call stack
1 :  [namespace = YzHSiPVZ] Optimize(mles, ^lf_id);
-------
2 :  [namespace = mvvazXJv] res=estimators.FitExistingLF(&likelihoodFunction,modelObjects);
-------
3 :  gard.baseLikelihoodInfo=gard.fitPartitionedModel(null,gard.model,null,null,FALSE);
-------

Check errors.log for execution error details.
liamxg commented 2 months ago

Dear @agselberg,

Do you have any comments on this, thanks.

spond commented 2 months ago

Dear @corneliusroemer and @liamxg,

Use ENV="TOLERATE_NUMERICAL_ERRORS=1;" on the HyPhy command line. Case sensitive, include ", do not export as shell ENV.

hyphy gard --alignment sample.fasta ENV="TOLERATE_NUMERICAL_ERRORS=1;".

Best, Sergei

liamxg commented 2 months ago

Dear @spond,

Thanks.

github-actions[bot] commented 2 days ago

Stale issue message