I'm trying to use the STAT tools to build a database from fasta sequences, and then using it to do metagenomics/taxonomic analyses.
So I'm following the tutorial in the build_db_and_run.sh script.
It says that we can do the identify_tax_ids part in multiple instances, but if we do, we have to use the tool called merge_kingdoms to combine results into a single file.
My problem is that there is no informations about the use of this tool.
The help of the tool is : need
I don't understand what I should put in each argument (except for tax.parents).
Also, I'm using the default parameters :
KMER_LEN=32
DENSE_WINDOW=4 # 1 kmer of 4 for dense db (just for example)
SPARSE_WINDOW=128 # 1 kmer of 128 for sparse db (just for example)
But I don't know if I really should ?
Same question for MAX_KMER_DICTIONARY_SIZE=5000000 # This number should be roughly as max kmers expected * 2.
I don't really know how I could know the maximum number of kmers expected.
Hello,
I'm trying to use the STAT tools to build a database from fasta sequences, and then using it to do metagenomics/taxonomic analyses. So I'm following the tutorial in the build_db_and_run.sh script. It says that we can do the identify_tax_ids part in multiple instances, but if we do, we have to use the tool called merge_kingdoms to combine results into a single file. My problem is that there is no informations about the use of this tool. The help of the tool is : need
I don't understand what I should put in each argument (except for tax.parents).
Also, I'm using the default parameters : KMER_LEN=32 DENSE_WINDOW=4 # 1 kmer of 4 for dense db (just for example) SPARSE_WINDOW=128 # 1 kmer of 128 for sparse db (just for example) But I don't know if I really should ?
Same question for MAX_KMER_DICTIONARY_SIZE=5000000 # This number should be roughly as max kmers expected * 2. I don't really know how I could know the maximum number of kmers expected.
Thank you in advance !