phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
111 stars 31 forks source link

False plasmid found in the plasmid reference database #126

Closed davidtong28 closed 1 year ago

davidtong28 commented 1 year ago

Hi! I have been using mobsuite to detect plasmids in some Campylobacter strains. I noticed that a cluster, AG887, was detected in all (~ 300) my C. jejuni isolates. I looked into this and found that the reference it is using (CP023447) contains Campylobacter core genes (Cj0031 ~Cj0143c) which were used in Campylobacter cgMLST. I am suspecting that the reference might be incorrectly marked as plasmid by the submitter and used in the reference database. I'm wondering the possibility of this being a false plasmid. Cheers! David

davidtong28 commented 1 year ago

Update: The GFA files provided by the assembler also show that the AG887 plasmids also does not form a loop, but connected to the chromosome

davidtong28 commented 1 year ago

Update: In another case, I have found that a mob cluster (AC321) are very common in my data. The predicted AC321 plasmids in my data can be divided into 2 groups. I found that the plasmids that hit the reference CP022078 are valid Campylobacter plasmid Type 1 (conjugal tetracycline resistance plasmids). However some AC321 plasmids hit CP013117, and they do not contain conjugal protein or tetracycline resistance proteins. I looked into this and found that the ladder are very likely prophages, because they are much smaller in size (20k compared to 80k CP013117), and many phage elements are found. I also looked at the reference CP013117 and found that it is described as a Megaplasmid with Mu-Like Prophage and Multidrug Resistance Genes. So I am guessing that my sequences might have had a hit with the phage part of the reference. Another evidence is that the ladder are also found to co-exist with AC320 in some isolates, but they share the same replicon cluster (475).

jrober84 commented 1 year ago

Thank you for the information, I have marked this plasmid as something to investigate for the next release of the MOB-suite database. I don't currently have a timeline on when a new DB will be released but MOB-suite does have a feature that you can give it sequences that you want to exclude from being classified as being a plasmid. You can provide the sequence for this element in your run and it will exclude it from being classified as a plasmid. This should let you continue your analysis without needing to wait on a new DB. Alternatively, if you are comfortable, you can edit the very large fasta file of plasmids and remove the accession from it and then rebuild the blast-db.

davidtong28 commented 1 year ago

Thank you for your response. I was wondering if predicted plasmids are allowed to have no hallmark features (only sequence similarity), and if so, is there a filtering option in Mobsuite that could filter them out? Thanks!

jrober84 commented 1 year ago

Presence of replicon, relaxase is not required to label a sequence as a plasmid. However, it would need to have overlap with the closed plasmid database or identified as circular. The following configurations are possible 1) Rep+, MOB+, PlasmidDB+, Circular+ 2) Rep+, MOB+, PlasmidDB-, Circular+ 3) Rep+, MOB+, PlasmidDB-, Circular- 4) Rep-, MOB-, PlasmidDB+, Circular+ 5) Rep-, MOB-, PlasmidDB+, Circular-