yangao07 / abPOA

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band
MIT License
111 stars 18 forks source link

abpoa segfaults when building with sse4.1 but not avx2 #26

Closed glennhickey closed 1 year ago

glennhickey commented 3 years ago

I have been getting some rare segmentation faults from abpoa. They only reproduce when compiling with sse4.1. I've been using sse4.1 to increase portability as well as save on cloud costs. I suspect maybe my input sizes and score matrices are too big for the registers used?

Here is an example that segfaults with sse4.1 but runs through with avx2 on this architecture:

wget http://public.gi.ucsc.edu/~hickey/debug/abpoa-sse41-segfault.tar.gz
tar zxf abpoa-sse41-segfault.tar.gz
abpoa ./ap_in_140727466179648.fa -O 400,1200 -E 30,1 -b 300 -f 0.025000 -t ./ap_in_140727466179648.fa.mat -r 1 -m 0 -N > ./ap_in_140727466179648.fa.out

Is this a bug in abpoa?

If not, is it possible to know a priori which inputs I can expect to be able to run with which instruction settings? And, ideally, to give an error message?

Thanks for your great library and continued support!

yangao07 commented 3 years ago

Hi Glenn, I did see this seg-fault error on my machine. However, when I tried to trace the error using valgrind, it kept getting killed as the malloc size is too large.

[SIMDMalloc] mm_Malloc fail! Size: 68719476736

I will try to figure it out, but it may need some time.

Yan

subwaystation commented 1 year ago

@yangao07 Are there any updates here? We observe the same problem when running local abPOA in pggb.

subwaystation commented 1 year ago

@glennhickey Taking a look at your architecture, it does not support SSE 4.1 if I see that correctly? https://en.wikichip.org/wiki/intel/xeon_e5/e5-2686_v4#Features

yangao07 commented 1 year ago

@subwaystation , you mean segfaults with SSE4.1 but not AVX2? Can you also share the data that causes the segfault?

glennhickey commented 1 year ago

@glennhickey Taking a look at your architecture, it does not support SSE 4.1 if I see that correctly? https://en.wikichip.org/wiki/intel/xeon_e5/e5-2686_v4#Features

@subwaystation I must have been referring to a r4.8xlarge which is listed as "Broadwell E5-2686 v4)". Broadwell's from 2014 and definitely supports SSE4, so it's probably an omission on that wikichip link. This is verified on wikipedia as well as me actually logging into one such instance just now and running

grep sse4_1 /proc/cpuinfo  | wc -l
32

The error in my original post still reproduces, when building abpoa with export sse41=1 but not without. (on the current abPOA, you need to drop the obsolete -N option from the command line.

glennhickey commented 1 year ago

@subwaystation , you mean segfaults with SSE4.1 but not AVX2? Can you also share the data that causes the segfault?

@yangao07 The data is at the top of this issue: https://github.com/yangao07/abPOA/issues/26#issue-937447928, just need to drop the -N from the command line. It segfaults if abpoa is built with sse41=1 but not avx2=1.

yangao07 commented 1 year ago

Hi @glennhickey @subwaystation ,

Latest commit: 5219dc32c1a1540614937f466fddd572ae646cf2

Sorry that it takes so long to reply, but I finally fixed this bug, which is caused by an overflow when the graph/alignment size is larger than 32GB (33.563 GB).

However, for AVX2, it will be still under 32GB (10.056 GB), so there is no segfault.

glennhickey commented 1 year ago

Great news @yangao07 !! Will test this out asap -- thanks for the fix!!