Open ArmandBester opened 3 months ago
Dear Armand,
even assuming that you manage to create the database, what is your use-case for using it? Unless you search >10GB of query sequences, your program runtime will be dominated by just loading the database (which will take super long as it is going to be around 2TB big in total).
If you search very large query files, this could still be worth it, but you will need to split the database, run the searches individually and then manually merge the output file. In such a case, I would recommend using m8 output, reducing the desired number of hits per query and then using a combination of the shell commands sort
(increase allowed memory usage and threads) and awk
(for filtering) to merge the files.
If you want to proceed with splitting the index, I would suggest the following:
/usr/bin/time -v
to measure runtime and memory usage ("MaxRSS" value)..lba.gz
to reduce the on-disk size of the index files. This may even make it faster when loading.If you have any further questions, feel free to ask :)
Dear lambda creators
I think I may be missing something. I am trying to create a nucleotide index on a 677G fasta (nt) file and I get the expected error:
My questions are, if I split the fasta file say into 3 and create separate indexes :
Kind regards Armand