lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.55k stars 556 forks source link

Fail to index human genome in a shell with 120Gb memory #318

Open yangyxt opened 3 years ago

yangyxt commented 3 years ago

I've recently suffered a lot from indexing a customized human genome (mask certain regions). I found several issues shown as below: 1st: image But I use ls -lh and confirmed the existence of the .bwt file.

2nd: image I don't know what's wrong with it. I didn't find an explanation online for this issue.

3rd: image Still don't know what's wrong with it. Didnt find an explanation for this issue.

For the first issue here, I googled and some said it is because of lack of memory, that's not likely to be the reason since I already have 120 GB allocated to this shell(by PBS pro) and only one bwa index job is running.

Furthermore, the /usr/bin/time gives memory profiling, and the peak RAM usage seems to be around 4596492 kb(4.4Gb) only.

6292.08user 57.20system 1:47:00elapsed 98%CPU (0avgtext+0avgdata 4596492maxresident)k
0inputs+13786480outputs (0major+83721376minor)pagefaults 0swaps

Therefore, what could possibly go wrong with it? BTW, I indexed successfully once for the same fasta file when commanding bwa index in the front end. But I need to implement this step into my pipeline and it should work as well in the back end.

Pls share some thoughts with this issue. Much appreciated.

markotitel commented 3 years ago

Most likely you are reading files from some shared storage (NFS/SAMBA/Windows share)

Have you resolved your issue?

yangyxt commented 3 years ago

Most likely you are reading files from some shared storage (NFS/SAMBA/Windows share)

Have you resolved your issue?

Yeah I think the main issue is about my fasta file. The file is mostly hard masked with N, leaving a small proportion recording actual DNA sequence. When I removed all the contigs with all Ns in it, the indexing process became normal.