muellan / metacache

memory efficient, fast & precise taxnomomic classification system for metagenomic read mapping
GNU General Public License v3.0
57 stars 12 forks source link

Using DB partition and MERGE does not match single DB abundance results #40

Open jaimeortiz-david opened 7 months ago

jaimeortiz-david commented 7 months ago

Hi, I am testing the validity of using smaller databases and then merging the results. However, when I am testing this, the results do not match those of querying one DB. For example, I have a DB with 40 species and created two DBs with 20 species each. When I use the MERGE function, the results of merging the two 20-species DB do not match the abundance results from the full 40-species DB. Here are the commands I am using:

metacache build 20sp_DB1 /test_merge_DB/DB1 -taxonomy ncbi_taxonomy -remove-overpopulated-features

metacache build 20sp_DB2 /test_merge_DB/DB2 -taxonomy ncbi_taxonomy -remove-overpopulated-features

metacache query 20sp_DB1 MixA_1.fastq.gz MixA_2.fastq.gz -pairfiles -tophits -queryids -lowest species -out res1.txt

metacache query 20sp_DB2 MixA_1.fastq.gz MixA_2.fastq.gz -pairfiles -tophits -queryids -lowest species -out res2.txt

metacache merge res1.txt res2.txt -lowest species -taxonomy ncbi_taxonomy -max-cand 4 -hitmin 2 -hitdiff 2 -mapped-only -abundances test_abundance.txt -abundance-per species > out_metacache_merge.txt