Open xie186 opened 5 years ago
Hi, I am working with big genomes. I don't know whether there is a limit for a file with multiple sequences, but it does have a maximum number of bases for one single sequence.
I tried to index the reference genome by chromosomes. I split the reference into one file per chromosome/sequence. The indexing worked well with chromosomes smaller than 2.10 Gb. However, when I tried to index the chromosomes larger than 2.15 Gb, the indexing tasks ended very fast with small files, and those files apparently cannot be used.
Take a guess, the indexing will be failed if there is one single sequence larger than (2^32)/2 = 2,147,483,648 bytes.
Hi, is @ChenJuiYANG right? @lh3
found answer here https://github.com/lh3/bwa#4gb
Hi @lh3, I'm wondering whether there is a maximum number of bases for a reference genome. I'd like to know whether it's practical to build index on NCBI nt database. Thanks.