veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
211 stars 69 forks source link

how to specify threads used with hyphy gard #1653

Closed lagphase closed 10 months ago

lagphase commented 1 year ago

Hello I don't see any flag for the number of threads used in hyphy gard --help

I'm running 250 genomes and the dafault is taking forever.

stevenweaver commented 1 year ago

Dear @lagphase,

I'm going to assume you mean number of processors. If not, feel free to correct me. Are you using standard HyPhy or HYPHYMPI?

For the former, try

hyphy CPU=1 gard with 1 being the number of CPUs you'd like to use.

For the latter, try mpirun -np 8 HYPHYMPI.

Best, Steven

lagphase commented 1 year ago

Hi Steven,

Thanks. Yes I meant CPUs and I'm using the standard HyPhy.

The code seems to work slightly faster. Any estimation on how long it would take for 250 sequences with 100 CPUs?

spond commented 1 year ago

Dear @lagphase,

It really depends on the analysis. What are you running? HyPhy will not be able to load 100 CPUs (HYPHYMPI can for FEL, MEME, and other such analyses), unless you have a large and (more importantly) long alignment. In, fact, HyPhy will run a mini-benchmark at the beginning on time-intensive tasks to guess-timate how many CPUs is worth using.

You can add ENV="VERBOSITY_LEVEL=1; to the command line, and for most analyses you will see a line that looks like this, which will also report the effective CPU load.

Current Max: -17515.127     (0 % done) LF Evals/Sec: 1571    CPU Load: 3.604 

Best, Sergei

lagphase commented 1 year ago

Hi Sergei,

I'm using hyphy gard to detect recombination sequences in the alignment. Yes my alignment is long and ~100 MB. Is there a way to speed up the process?

Thanks.

spond commented 1 year ago

Dear @lagphase,

GARD will make better use of an MPI environment. That said, it is meant for screening "gene-sized" aligments (at most 10-20kb). While GARD will attempt to run for a 100 megabase alignment, it will be very slow. How many sequences do you have? Because GARD only considers variable sites, the effective search space could be manageable if you have a few (5-20) sequences.

Best, Sergei

github-actions[bot] commented 10 months ago

Stale issue message