Discrepancy on the number of fragments between *ccm.csv and *kmaout.frag.

vrmarcelino / CCMetagen

Microbiome classification pipeline

GNU General Public License v3.0

63 stars 19 forks source link

Discrepancy on the number of fragments between ccm.csv and kmaout.frag. #44

Closed anacarolsoares closed 1 year ago

anacarolsoares commented 1 year ago

Hi! I have noticed a discrepancy on the number of fragments (in this case scaffolds) classified as a certain specie in output files from _scaffolds.ccmetagen.ccm.csv and _scaffolds.kmaout.frag.

For file _scaffolds.ccmetagen.ccm.csv from CCmetagen the sum of column Depth for rows classified as a certain specie (e.g. Taenia solium) is 5. But I could find just 4 fragments in the KMA output _scaffolds.kmaout.frag.

I used as key columns to match: [Closest_match] - _scaffolds.ccmetagen.ccm.csv [template_name] - _scaffolds.kmaout.frag

Is there any explanation for this?

Looking forward to your reply. Thanks.

ed_scaffolds.ccmetagen.ccm.csv

*_scaffolds.kmaout.frag.

vrmarcelino commented 1 year ago

Hi Ana,

Sorry for the slow response. I think this might be due to finding more than one match for a given template, but let's investigate. You are using contigs (not paired-end reads) I assume? Which flags did you use with KMA?

Vanessa

anacarolsoares commented 1 year ago

Hi!

Yes, we are using contigs.

I'm using the following command:

Kma -i contigs.fasta -o outFasta -t_db databasePath -ca -1t1 -mem_mode -ef

Thanks for your reply. Ana Carolina.

vrmarcelino commented 1 year ago

Hi!

Okay, a few more questions: Could you also tell us the command you used with ccmetagen? Which database you used, the NCBI nt?

Not sure if this is your case but the -1t1 flag may be tricky to use with contigs (unless you are using a ref. database of complete genomes): you are telling KMA to find only one match for that scaffold, but there might be multiple genes (and therefore multiple equally good matches) in the database.

anacarolsoares commented 1 year ago

Hi!

ccmetagen command:

CCMetagen.py -i inputFileFasta -o outFasta --depth_unit fr --map inputMapFasta --depth 1 --query_identity 80 -ef y

Yes. NCBI nt.

I see your point. We are going to test without the -1t1 flag.

Thanks for your reply.

vrmarcelino commented 1 year ago

Closing issue due to inactivity. Feel free to open it again if you need help.

vrmarcelino / CCMetagen

Discrepancy on the number of fragments between *ccm.csv and *kmaout.frag. #44

Discrepancy on the number of fragments between ccm.csv and kmaout.frag. #44