ncbi / ngs-tools

Other
103 stars 25 forks source link

How does "merge_kingdoms" mentioned in the "build_db_and_run.sh" script work? #31

Open ecalfapietra opened 1 year ago

ecalfapietra commented 1 year ago

Hello,

I'm trying to use the STAT tools to build a database from fasta sequences, and then using it to do metagenomics/taxonomic analyses. So I'm following the tutorial in the build_db_and_run.sh script. It says that we can do the identify_tax_ids part in multiple instances, but if we do, we have to use the tool called merge_kingdoms to combine results into a single file. My problem is that there is no informations about the use of this tool. The help of the tool is : need I don't understand what I should put in each argument (except for tax.parents).

Also, I'm using the default parameters : KMER_LEN=32 DENSE_WINDOW=4 # 1 kmer of 4 for dense db (just for example) SPARSE_WINDOW=128 # 1 kmer of 128 for sparse db (just for example) But I don't know if I really should ?

Same question for MAX_KMER_DICTIONARY_SIZE=5000000 # This number should be roughly as max kmers expected * 2. I don't really know how I could know the maximum number of kmers expected.

Thank you in advance !

tolot27 commented 6 months ago

@ecalfapietra Did you build your db sucessfully? I'm looking for the same parameters to build a refeq k-mer db.