smaegol / PlasFlow

Software for prediction of plasmid sequences in metagenomic assemblies
GNU General Public License v3.0
98 stars 28 forks source link

Should MAG completeness be considered when running Plasflow? #32

Open Riselya opened 4 years ago

Riselya commented 4 years ago

Hi there, I've been applying Plasflow to MAGs from waste water treatment samples, and it seems to be working really well! My question is more theoretical than technical - given that many of our assembled genomes have low completeness (even when they are highly abundant) is it still appropriate to include these in the pipeline? The paper excludes short sequences but doesn't mention assembly quality, and I am not sure if it should make a difference or not. If I set a threshold at 50% completeness, this excludes most of the MAGs, including some of the most abundant ones. In your study, you test Plasflow on microbial mats, for which I assume most of the assemblies would be rather incomplete. So, would it be sensible to set completeness thresholds if one wants high standards for identifying true associations, or is it irrelevant? Thanks!

smaegol commented 3 years ago

Hi,

sorry for the late reply. I think that MAG completeness is not as much important as the proper input sequence length. Even with little completeness, you should be able to predict its origin if it will have a proper kmer profile.