snayfach / MIDAS

An integrated pipeline for estimating strain-level genomic variation from metagenomic data
http://dx.doi.org/10.1101/gr.201863.115
GNU General Public License v3.0
119 stars 52 forks source link

Inclusion criteria for MAGs when setting up custom database #104

Open adityabandla opened 4 years ago

adityabandla commented 4 years ago

Hi Stephen, Thanks for the great tool. I have a set of species-level MAGs that range in completeness from 50-100% and redundancy/contamination 0-10%. These are species representatives chosen using dRep with a genome average ANI cutoff of 95%

Since these are environmental MAGs, I would like to construct my own database. Given the above numbers, can I include all MAGs or is it better to include only MAGs that are substantially complete say >70%? Also, to what extent does redundancy affect downstream steps such as calling SNPs?

Best, Adi