voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
585 stars 134 forks source link

Graviton2-b #330

Open accopeland opened 2 years ago

accopeland commented 2 years ago

Another attempt at an aarch64 support. I made some changes to CMakelists.txt to prevent errors at config time. Added some ifdefs around the arch-specific parts so you should get identical x86_64 performance now and working binaries on aarch64 (only graviton2 tested so far).

accopeland commented 2 years ago

Apologies, I forgot that bit. I could update the branch either via a git submodule or local copy (from https://github.com/simd-everywhere/simde). Do you have a preference?

On Mon, Apr 4, 2022 at 9:48 PM Dinghua Li @.***> wrote:

@.**** commented on this pull request.

In src/kmlib/kmrns.h https://github.com/voutcn/megahit/pull/330#discussion_r842339259:

@@ -8,7 +8,12 @@

include

include

include

-#include +#if defined(GNUC) && defined(aarch64)

  • define SIMDE_ENABLE_NATIVE_ALIASES

  • include "../simde/x86/avx2.h"

Thank you for the PR.

Where is ../simde/x86/avx2.h?

— Reply to this email directly, view it on GitHub https://github.com/voutcn/megahit/pull/330#pullrequestreview-931225384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE7ERRJN45FKO3V6RIIABLVDPAYPANCNFSM5SRCFAEA . You are receiving this because you authored the thread.Message ID: @.***>

voutcn commented 2 years ago

Found that with this PR HasPopCnt and HasBmi2 will always return false on aarch64. And the megahit python script will always choose the megahit_core binary without popcnt and BMI2. Thus you don't need to include any simde headers.

#if !defined(__aarch64__)
  #include <x86intrin.h>
#endif

and change https://github.com/voutcn/megahit/blob/0abdd48f6670a91a3cc3fc4238e2812971d428bb/src/kmlib/kmrns.h#L268

to

defined(__BMI2__) && defined(USE_BMI2) && !defined(__aarch64__)

should be sufficient to get it compiled.

voutcn commented 2 years ago

FYI. We have three megahit_core binary compiled: one for popcnt + BMI2, one for popcnt only and one for none of them. And the megahit python script will choose the correct binary (through the functions in cpu_dispatch.h) here: https://github.com/voutcn/megahit/blob/f8afe5dc565ca79dabb61e4f822135ef4926baac/src/megahit#L613-L630

If we have a good enough way to check POPCNT or BMI2 support on ARM we can revisit the POPCNT and BMI2 hardware acceleration options for ARM.