veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

Question about running GARD on commandline #1568

Closed raufs closed 1 year ago

raufs commented 1 year ago

Hi,

Thank you all so much for developing HyPhy and all the methods behind it!

I have been using HyPhy on the commandline and have incorporated it into a software (as a conda installable dependency) and have a question and a minor note.

The minor note: It seems the header in the FUBAR json file is out of date, but the columns of the actual data match the table shown upon uploading the json to HyPhy vision.

The question pertains to running FUBAR and GARD. FUBAR is really fast and can take a bit to run but always finishes. It seems from reading some documentation/slides that GARD is recommended prior to running selection analyses to partition multi-domain proteins. I have noticed that GARD can take a while to run though, even in "Faster" mode and with a small number of sequences. I am wondering if you have any best-practices/suggestions for running GARD to prevent it from getting "stuck" on certain input alignments.

Thank you, Rauf

DDudka9 commented 1 year ago

I second what Rauf said. It takes me at least an hour to run GARD via commandline using an alignment of 16 sequences about 850bp each.

spond commented 1 year ago

Dear @raufs and @DDudka9,

GARD is going to be slower that FUBAR by orders of magnitude because it solves a much "larger" problem through an expensive search algorithm. Here are some general suggestions to accelerate GARD

  1. Make sure you run the data in the Nucleotide mode (default) and with the --mode Faster flag.
  2. GARD really benefits from MPI (even if all you have are multiple cores), so you should consider running it via mpirun -np N HYPHYMPI gard ...
  3. You can subsample your alignment, especially if it's quite large. GARD has a "sweet spot" of about 20-40 sequences, so if your alignment has more (e.g. >100), you could select those which represent different clades in a larger tree, and run GARD on such a subset. Then you could reapply the inferred breakpoints to the entire alignment -- split it into non-recombinant fragments, infer a tree for each, then run GARD.

HTH, Sergei

raufs commented 1 year ago

Thank you for the advice!