lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.54k stars 556 forks source link

BWA aln runs slower for first time and subsequent runs are faster #339

Open rjg2186 opened 2 years ago

rjg2186 commented 2 years ago

Hi @lh3 , @bwlang , @jmarshall , @rmzelle , @sjackman

I am executing BWA aln in google cloud VM and the first run for the day or first run after long time gap (example > 8 hours), takes more time for BWA aln and samse. But the subsequent immediate runs (within 15 or 20 or 60 minutes), takes almost 6 times less time compared to the first run. Is there anything related to index files being stored in cache or how it works ? If I try to align with another genome index, again the first run takes more time. But when I execute the same command with same input file and index file in local linux, all runs are very consistent for the total time of execution. Please provide your inputs on this. Thanks

sjackman commented 2 years ago

My best guess would be that the either the index or the reads are being fetched from their file system (possibly network file system) and cached in memory, and so subsequent runs are faster.

bwlang commented 2 years ago

Seems like a motley crew of at mentions... I wonder how @rjg2186 selected this surprising group. I agree that this is very likely not related to bwa. If you want more consistent performance, you might want to copy the reference and indexes onto a local volume as a first step. I don't know much about google's compute environment, so I don't know how practical that is.

rjg2186 commented 2 years ago

@sjackman and @bwlang , thank you both for the inputs. I was just curious to know, how much time and free memory would is required for BWA aln to load the index into memory before starting the alignment process ? I saw there is option to pre-load index using bwa shm, but looks like it works only with bwa mem and with bwa aln. Thanks,

rjg2186 commented 2 years ago

Hi @sjackman @lh3 I was just curious to know, how much time and free memory is required for BWA aln to load the index before starting the alignment process ? Does BWA aln first look into cache for index before loading index into memory ? I tried to split the genome into multiple files based on chromosome and indexed them separately and tried to execute, but even in this case, the 1st execution takes more time compared to subsequent runs. The testing is consistent in linux machine with high memory (more than 500GB RAM), but not consistent for machines with less memory. I am not clear why this first process in a day takes more time compared to subsequent runs. Looking forward for your inputs. Thank you.