veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
201 stars 68 forks source link

Using GARD output for BUSTED/other analysis #1596

Closed jananiharan closed 1 year ago

jananiharan commented 1 year ago

Hello,

I assumed that the GARD output (*best-gard) would be a partitioned dataset that could be used downstream for selection analyses like BUSTED. When I use this file as the input for BUSTED, though, I get the error message

Illegal right hand side in call to Topology id = ...; it must be a string, a Newick tree spec or a topology

The file *best-gard does not seem to have any phylogenetic trees, which I assume is the issue here. It has a MATRIX section with the nt sequences, and the end of the file says:

BEGIN ASSUMPTIONS; CHARSET span_1 = 1-639; CHARSET span_2 = 640-83

The JSON output file has 3 Newick trees though.

Could you help me understand what is going on? Do I need to parse the trees from the JSON file to use downstream? Or is the GARD output file incomplete? Thanks.

spond commented 1 year ago

Dear @jananiharan,

There should be a TREE block in the best-gard file as well, e.g. as in the attached file. It appears that your GARD file was not properly written to disk (possibly due to an error).

How did you obtain it?

Best, Sergei

Canine_distemper_virus_H.nex.best-gard.zip

jananiharan commented 1 year ago

Hi Sergei,

The TREE block is definitely missing in my file (see attached). I have HYPHY 2.5.14(MP) for Linux on x86_64 installed in a separate conda environment and I used this command to generate my file:

hyphy gard --code 11 --alignment cbiA.pal2na

cbiA.pal2nal.best-gard.zip

spond commented 1 year ago

Dear @jananiharan,

2.5.14 is a version that is > 3 years old. Would you please try a more recent version of HyPhy? I ran your example through the current release and obtained the following results...

type: nucleotide
rv: None
>Maximum number of breakpoints to consider (permissible range = [1,100000], default value = 10000, integer): max-breakpoints: 10000
mode: Normal
>Loaded a nucleotide multiple sequence alignment with **26** sequences, **1158** sites (993 of which are variable) from `/Users/sergei/Downloads/cbiA.pal2nal.best-gard`
>Minimum size of a partition is set to be 49 sites

### Fitting the baseline (single-partition; no breakpoints) model
* Log(L) = -27944.50, AIC-c = 56009.02 (57 estimated parameters)

### Performing an exhaustive single breakpoint analysis
Done with single breakpoint analysis.
   Best sinlge break point location: 594
   c-AIC  = 55937.6507166395

### Performing multi breakpoint analysis using a genetic algorithm
Done with 2 breakpoint analysis.
    Best break point locations: 638, 837
    c-AIC = 55886.57605010319
Done with 3 breakpoint analysis.
    Best break point locations: 157, 638, 837
    c-AIC = 55877.80808824953
Done with 4 breakpoint analysis.
    Best break point locations: 157, 638, 855, 914
    c-AIC = 55900.2818042568

With trees at the end as expected.

Best, Sergei

cbiA.pal2nal.best-gard.zip

jananiharan commented 1 year ago

OK, updating to 2.5.48 fixed this issue! For anyone else who might be installing hyphy with conda, I had to specify the more recent version using conda install.