I was benchmarking bwa with various EC2 configurations to decide which to choose for our Batch jobs.
We blindly used Alpine as a base image which uses different GCC compiler.
Interestingly enough bwa compiles successfully and more interesting it works with our default command run which is "inherited" from a script that was running bwa on bare metal servers.
My goal was to benchmark and figure which EC2 instance would be "optimal" to use in our Batch jobs.
As usual first tests are "just run it as is" to get successful bwa mem alignment. Our original command used parallel which was starting bwa. Initial test was successfull.
Test I've conducted after initial "base test" which was successful is to remove parallel command and just use bwa since it supports -t and from Biostars forum posts which explains that bwa does not need any special execution for multi cpu processing. Just wanted to remove "complexity" of understanding parallel and how it should --pipe data to bwa with -L 8 parallel parameter.
I've started with -t 8 just to see "where I am" and got successful result. Then move on to more CPU. And with -t 16bwasegfaulted each time with same sample and ref data.
Now this is where things started to become strange, because with parallel alignment was runing sucessfully.
Parallel was starting 4bwa processes each with -t 4. Basically used 16 threads.
Using Alpine and compiling bwa in Alpine running more than cca14 treadssegfaulted all the time no matter what I did.
EC2 configuration was the same for all the tests. 32CPU and 64GB Ram.
After few days of experimenting with increasing Docker /dev/shm (I thought that was the problem, but then figured it does not use /dev/shm at all I started from scratch.)
Spin up Amazon Linux 2 EC2 and build bwa. Run a test. It was success immediatlly with -t 32.
Last test was to compile bwa in Amazon Linux2 base docker image.
Run the test and it passes, also with -t 32
This issue intention is just to inform people not to use Alpine GCC versions for bwa since it behaves strange. Or probably our lack of deep Linux and GCC understanding lead us to blindly and brute force test two weeks without any logical results and reasons why it failed.
I was benchmarking
bwa
with various EC2 configurations to decide which to choose for our Batch jobs.We blindly used
Alpine
as a base image which uses different GCC compiler.Interestingly enough
bwa
compiles successfully and more interesting it works with ourdefault
command run which is "inherited" from a script that was runningbwa
on bare metal servers.My goal was to benchmark and figure which EC2 instance would be "optimal" to use in our Batch jobs. As usual first tests are "just run it as is" to get successful
bwa mem
alignment. Our original command usedparallel
which was startingbwa
. Initial test was successfull.Test I've conducted after initial "base test" which was successful is to remove
parallel
command and just usebwa
since it supports-t
and fromBiostars
forum posts which explains thatbwa
does not need any special execution for multi cpu processing. Just wanted to remove "complexity" of understandingparallel
and how it should--pipe
data tobwa
with-L 8
parallel parameter.I've started with
-t 8
just to see "where I am" and got successful result. Then move on to more CPU. And with-t 16
bwa
segfaulted
each time with same sample and ref data.Now this is where things started to become strange, because with
parallel
alignment was runing sucessfully. Parallel was starting4
bwa
processes each with-t 4
. Basically used 16 threads.Using
Alpine
and compilingbwa
inAlpine
running more thancca
14 treads
segfaulted
all the time no matter what I did. EC2 configuration was the same for all the tests. 32CPU and 64GB Ram.After few days of experimenting with increasing Docker
/dev/shm
(I thought that was the problem, but then figured it does not use/dev/shm
at all I started from scratch.)Spin up
Amazon Linux 2
EC2 and buildbwa
. Run a test. It was success immediatlly with-t 32
.Last test was to compile
bwa
inAmazon Linux2
base docker image. Run the test and it passes, also with-t 32
This issue intention is just to inform people not to use Alpine GCC versions for
bwa
since it behaves strange. Or probably our lack ofdeep Linux and GCC understanding
lead us to blindly andbrute force
test two weeks without any logical results and reasons why it failed.