Open balags1 opened 4 years ago
That is not normal. Can you share the FASTA file? My email to my Google account if it is private?
It is happening with any standard genome fasta file, doesn't appear to be file specific.
The issue is with version 2.10.0+, I also have an older version 2.2.3+ that doesn't produce these big files. Both are the windows 64-bit versions of Blast+. V2.10.0+ is creating .ndb and .ntf files that are 297 GB in size.
Ah. I wonder if this is due to the new v5 BLAST database format? It would be surprising but not impossible that they are optimised for larger database.
The Galaxy wrappers / provided BLAST database datatype doesn't actually know about the new extensions, but that is a separate problem:
https://github.com/peterjc/galaxy_blast/blob/master/datatypes/blast_datatypes/blast.py#L244
I have not made time to explore this yet - and have limited time this week due to childcare.
Duly noted. From a resources perspective, we will stick to the prior version for the time being.
Command used was: makeblastdb -in seq-contigs.fasta -out seqdb -parse_seqids -dbtype nucl
From a 4 MB fasta file, this is creating blast databases of size 500 GB+. Is this normal? What could be wrong with what I am doing?