lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.55k stars 556 forks source link

Maximum number of bases for a reference genome #251

Open xie186 opened 5 years ago

xie186 commented 5 years ago

Hi @lh3, I'm wondering whether there is a maximum number of bases for a reference genome. I'd like to know whether it's practical to build index on NCBI nt database. Thanks.

ChenJuiYANG commented 1 year ago

Hi, I am working with big genomes. I don't know whether there is a limit for a file with multiple sequences, but it does have a maximum number of bases for one single sequence.

I tried to index the reference genome by chromosomes. I split the reference into one file per chromosome/sequence. The indexing worked well with chromosomes smaller than 2.10 Gb. However, when I tried to index the chromosomes larger than 2.15 Gb, the indexing tasks ended very fast with small files, and those files apparently cannot be used.

Take a guess, the indexing will be failed if there is one single sequence larger than (2^32)/2 = 2,147,483,648 bytes.

xiekunwhy commented 1 year ago

Hi, is @ChenJuiYANG right? @lh3

xiekunwhy commented 1 year ago

found answer here https://github.com/lh3/bwa#4gb