veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
201 stars 68 forks source link

Problems using GARD output #1605

Closed jananiharan closed 1 year ago

jananiharan commented 1 year ago

Hello,

I am trying to use partitioned datasets from the GARD analysis to run aBSREL, but keep running into errors that make me think I don't understand how GARD partitions my data. I am working with around 20-30 files, and these are the two most common errors that I can't figure out:

  1. Error:The number of tree tips in 'OgriqMA_.tree_0'(1) is not equal to the number of sequences in the data filter associated with the tree.

I split the alignment according to the columns specified in the best-gard file, and I assumed that the trees in the best-gard file would be consistent with the split alignments. Is this incorrect? Should I remake the Newick tree once I split the alignment at the breakpoint?

  1. Error: The input alignment must have the number of sites that is divisible by 3 and must not contain stop codons in call to assert(absrel.codon_filter.sites*3==absrel.codon_data.sites, error_msg);

I notice that the alignment in this partitioned file has 149 characters, so it makes sense that the number of sites is not divisible by 3. However, this is the breakpoint specified by GARD -- how do you suggest dealing with discrepancies like this? I ran GARD with the default options, i.e. nucleotide mode, and I'm not sure if I should be using it in codon mode instead to deal with this issue.

spond commented 1 year ago

Dear @jananiharan,

Can you please provide examples of your error situations? I am not sure why error condition (1) would arise. Could be due to sequence naming/renaming.

For (2), GARD does not partition the data along codon boundaries. Downstream tools like BUSTED, FEL, etc will automatically adjust such partitions to the nearest codon boundary, but if you are partitioning manually, this adjustment will need to happen manually as well.

Best, Sergei

jananiharan commented 1 year ago

Error (1) does seem to be an artifact of sequence renaming during the run. I'm able to fix it by manually correcting some names. For Error (2), I'll split the alignment at different locations that don't break the codon. Thank you so much for your patience!