yangao07 / abPOA

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band
MIT License
111 stars 18 forks source link

Pre-built binary programme not compatiable between difference versions of CPUs? #55

Closed jiaan-yu closed 2 months ago

jiaan-yu commented 8 months ago

Hi Yan,

We are trying to run abPOA (with pre-built binary) on cloud . But we found this binary didn't work between some versions of Intel CPUs, or between AMD and Intel CPUs. Is there a way to by-pass this issue? We are considering compling the binary every single time before we launch jobs in cloud, but that's not very elegant :(

Cheers, Jiaan

shenker commented 8 months ago

I also had to recompile abpoa to get it running on my institution's compute cluster. Might be nice if the default CPU flags were compatible with a wider range of CPUs (or even if the README just mentioned this pitfall and specifically which cpu features were enabled in the default binary builds). The bioconda/pypi binaries should probably not be build with -march=native, because then the binaries are completely dependent on the cpus that happen to be used for the github action runners doing the build. Explicitly specifying a reasonable list of common cpu features in the bioconda/pypi build scripts should address this.

yangao07 commented 8 months ago

I see the issues here.

@shenker

Explicitly specifying a reasonable list of common cpu features in the bioconda/pypi build scripts

Do you have any specific suggestions? I am not an expert in doing this. If you can provide some examples, I can try to make the change in the scripts.

jiaan-yu commented 8 months ago

I found some relevant information about the use of -march=native, if you would find it helpful. https://stackoverflow.com/questions/52653025/why-is-march-native-used-so-rarely https://lemire.me/blog/2018/07/25/it-is-more-complicated-than-i-thought-mtune-march-in-gcc/

subwaystation commented 8 months ago

You could do -march=sandybridge. It is a reasonable old architecture. In PGGB we have made positive experiences so far https://github.com/pangenome/pggb/blob/81efdfe0c2accf02f899947d179fef41ce20f8b9/Dockerfile#L74.

There are 2 other ways, leading to much better abPOA performance:

cjw85 commented 6 months ago

If I recall correctly conda packages are typically built with -march=haswell, the generation after sandybridge. This gives you access to SSE4.2 and AVX2, but not AVX512. Haswell is more than 10 years old at this point, so if your don't have SSE4.2 at this point you should probably get a newer computer!