samtools / htscodecs

Custom compression for CRAM and others.
Other
30 stars 18 forks source link

Only use the ARM NEON 32-way unrolled rANS on AArch64. #82

Closed jkbonfield closed 1 year ago

jkbonfield commented 1 year ago

NEON alone isn't a sufficient guard as AArch32 also has some limited Neon capabilities. While we could no doubt have a 32-bit alternative, for now this is the simple fix and let aarch32 use the scalar implementation.

Doing a 32-bit neon is a complex task and without having access to the hardware it's pretty much impossible. I also wouldn't have high hopes for any significant speed gains over scalar with only half the lanes available.

Fixes #81

clausecker commented 1 year ago

Check if you can also adapt the NEON configure test in the configure script to only say NEON is supported if the AArch64 intrinsics are present. You can use those shown in the error messages in bug #81.

clausecker commented 1 year ago

Note that “half the lanes” isn't true. Even AArch32 NEON has support for 128 bit vectors. You can test the code on affordable hardware such as a Raspberry Pi if you so desire.

daviesrob commented 1 year ago

I was able to verify that this fixed the problem on qemu-system-arm -machine virt -cpu cortex-a15.

As a side note, testing this would be easier if the freebsd documentation covered booting arm images via qemu. There's a wiki page for arm64/QEMU, but nothing for 32 bit arm. Also, the handbook doesn't really have much to say about what you do with VM or SD card images.

For the record, I used (on a Debian bullseye host):

wget 'https://download.freebsd.org/releases/arm/armv7/ISO-IMAGES/13.2/FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img.xz'
xz -d FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img.xz
qemu-img create -f qcow2 -b FreeBSD-13.2-RELEASE-arm-armv7-GENERICSD.img FreeBSD-13.2-RELEASE-arm-armv7-working.qc2
qemu-system-arm -machine virt -cpu cortex-a15 -bios /usr/lib/u-boot/qemu_arm/u-boot.bin -m 2G -nographic -display none -drive file=FreeBSD-13.2-RELEASE-arm-armv7-working.qc2,if=virtio -nic user
clausecker commented 1 year ago

Thank you for your work!