Closed palomo11 closed 6 years ago
First, I would recommend using the '-n' option to estimate species abundance using the same number of reads per metagenome. As far as a cutoff is concerned, that's really up to you. You might try thresholding based on the number of mapped reads (e.g. at least one, two, ten, etc.) and seeing how the cutoff affects the ordination of samples in PCA space.
Hope that helps.
Thanks, Stephen
Thanks for the advice. I will try that!
Hi,
I have a database with 50 genomes. I have applied
run_midas.py species
on several hundreds metagenomes. Based on the coverage file I want to do some kind of multivariate analysis (PCA,...), but instead of using the absolute value of the coverage file I want to do it with presence and absence (so a matrix with a 1 if the species is present in the metagnome and 0 if is absent).My question is which coverage threshold should I choose to determine if a species if present or absent. It should be considered presence as every value above 0? 0.01? 0.1? or which value would you recommend me?
Thank you very much in advance.