veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

Barely any parallelism (max 110% CPU) running `hyphy gard` on macOS arm64 #1730

Open corneliusroemer opened 1 month ago

corneliusroemer commented 1 month ago

Is it expected that there's hardly any parallelism when running hyphy gard on macOS arm64?

I tried both: self-compiled from source and bioconda builds and I get at most around 110% CPU usage.

Repro:

hyphy gard --alignment sample.fasta.txt --mode Faster

with this file: sample.fasta.txt

image

I expected parallelism based on, e.g., this sentence in the paper:

GARD outperforms other methods and can be run in parallel on a cluster of computers, and so is well suited to screen for recombination in large datasets.

As the algorithm should be pretty easy to parallelize I suspect there's some issue on macos. I'll check whether x86-64 has the same issue (when run through Rosetta2).

corneliusroemer commented 1 month ago

When I use the osx-64 binary from bioconda and run through Rosetta2, I do get parallelism:

Hyper 2024-08-13 21 14 14

That suggests there's an issue with osx-arm64 specifically. Note that I tried both: the bioconda package (which I made available yesterday) and having compiled from source.

Interestingly, the native, nearly-single-threaded arm64 binary is still faster than the multi threaded emulated one. Usually, emulation slows things down by a factor of 2 in my experience, but here it seems to be more.

spond commented 2 weeks ago

Dear @corneliusroemer,

GARD benefits from MPI distributed environments, and not that much from multi-threading. If you have openmpi installed, run make MPI and then mpirun -np N HYPHYMPI gard ... where N is the number of cores on your system.

Best, Sergei