lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.54k stars 556 forks source link

Not an issue but hopefully usefull for Dockerized bwa. Alpine base image segfaults. #331

Open markotitel opened 3 years ago

markotitel commented 3 years ago

I was benchmarking bwa with various EC2 configurations to decide which to choose for our Batch jobs.

We blindly used Alpine as a base image which uses different GCC compiler.

Interestingly enough bwa compiles successfully and more interesting it works with our default command run which is "inherited" from a script that was running bwa on bare metal servers.

My goal was to benchmark and figure which EC2 instance would be "optimal" to use in our Batch jobs. As usual first tests are "just run it as is" to get successful bwa mem alignment. Our original command used parallel which was starting bwa. Initial test was successfull.

Test I've conducted after initial "base test" which was successful is to remove parallel command and just use bwa since it supports -t and from Biostars forum posts which explains that bwa does not need any special execution for multi cpu processing. Just wanted to remove "complexity" of understanding parallel and how it should --pipe data to bwa with -L 8 parallel parameter.

I've started with -t 8 just to see "where I am" and got successful result. Then move on to more CPU. And with -t 16 bwa segfaulted each time with same sample and ref data.

Now this is where things started to become strange, because with parallel alignment was runing sucessfully. Parallel was starting 4 bwa processes each with -t 4. Basically used 16 threads.

Using Alpine and compiling bwa in Alpine running more than cca 14 treads segfaulted all the time no matter what I did. EC2 configuration was the same for all the tests. 32CPU and 64GB Ram.

After few days of experimenting with increasing Docker /dev/shm (I thought that was the problem, but then figured it does not use /dev/shm at all I started from scratch.)

Spin up Amazon Linux 2 EC2 and build bwa. Run a test. It was success immediatlly with -t 32.

Last test was to compile bwa in Amazon Linux2 base docker image. Run the test and it passes, also with -t 32

This issue intention is just to inform people not to use Alpine GCC versions for bwa since it behaves strange. Or probably our lack of deep Linux and GCC understanding lead us to blindly and brute force test two weeks without any logical results and reasons why it failed.