vrmarcelino / CCMetagen

Microbiome classification pipeline
GNU General Public License v3.0
64 stars 19 forks source link

parameters for low-biomass samples #31

Closed MjelleLab closed 3 years ago

MjelleLab commented 3 years ago

I am analysing human tissue biopsies (WGS) and when using KrakenUniq (requiring at least 20 reads and 1000 uniq Kmers) I detect many bacteria that are likely to be present based on other studies, however, CCMetagen detect few of no bacteria when using the default setting. Any suggestions how to set the parameters when the amount of microbes is very low in the sample (<1%). Setting --coverage 1, will that lead to too many false positives? Best,

vrmarcelino commented 3 years ago

Hi!

Are you observing many hits in the KMA .res file? If yes, adjusting coverage and depth will help. I find it useful to blast some of the classified sequences to get an idea of whether they are true or false positives (good to do that for both Kraken and CCMetagen outputs). Might also be worth trying some marker-based approaches.

MjelleLab commented 3 years ago

The number of hits in kma.res is similar to the number of hits in CCmetagen. So I guess its kma that has few hits.

vrmarcelino commented 3 years ago

Yes, so there is not much you can do on the CCMetagen side. You can try and change some of the KMA settings to make it more permissive, but I haven't tested how it would affect the accuracy.