oschwengers / referenceseeker

Rapid determination of appropriate reference genomes.
https://doi.org/10.21105/joss.01994
GNU General Public License v3.0
88 stars 5 forks source link

ReferenceSeeker and Metagenome-Assembled Genomes #23

Closed padbc closed 2 years ago

padbc commented 2 years ago

This is neither a feature request nor a bug, although it's closer to the former.

Thank you for developing such a useful tool. If I understood correctly, ReferenceSeeker can be used with MAGs. If so, do you have approximate guidelines as to what would be appropriate in terms of, e.g., contamination and completeness? Thanks very much.

oschwengers commented 2 years ago

Hi @padbc , Yes, in principle, ReferenceSeeker can be used with any genome of any taxon, though we only use it with and provide databases for prokaryotic genomes. This is due to the inherent combination of methodologies: both the k-mer profile-based lookup of candidate genomes and ANI calculations against reference genomes are taxon independent and work on any DNA sequence.

This having said, in terms of methodology there's no difference between bacterial isolates' genomes and MAGs. However, one should bear in mind that both contamination and completeness have an impact on the results. A contamination will reduce the ANI value and incomplete genomes will - of course - result in lower conserved DNA values. For instance, having a contamination of more than 5% of a genome will make it impossible to detect a reference genome of the same species due to the 0.95 ANI threshold. The same holds true for incomplete genomes and the 69% conserved DNA threshold. In these cases you might want to adapt these values via --ani and --conserved-dna.

In situations where you cannot find any reference genome, you might also want to give --unfiltered a try.

I'm sorry that I cannot provide any specific thresholds. I don't have much experience with MAGs and everything will highly depend on the completeness and contamination values.

padbc commented 2 years ago

Excellent -- thanks very much for detailed reply.