phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
124 stars 33 forks source link

Question: mob_recon with custom db classifying contigs as chromosome vs novel clusters #131

Closed lerminin closed 1 year ago

lerminin commented 1 year ago

Hi there,

I'm using a custom database with mob_recon to recover plasmids from incomplete assemblies. I have a handful of linear contigs that appear to be fragments of plasmid sequences (hits to rep, MOB, MPF), but are getting classified as "chromosome" with my custom db. When I run them with mob_recon using the reference db, they get classified as plasmids.

Mob-recon contig_report.txt using custom db:

sample_id   molecule_type   primary_cluster_id  secondary_cluster_id    contig_id   size    gc  md5 circularity_status  rep_type(s) rep_type_accession(s)   relaxase_type(s)    relaxase_type_accession(s)  mpf_type    mpf_type_accession(s)   orit_type(s)    orit_accession(s)   predicted_mobility  mash_nearest_neighbor   mash_neighbor_distance  mash_neighbor_identification    repetitive_dna_id   repetitive_dna_type filtering_reason
assembly    chromosome  -   -   ctg11_length=166910_depth=1.17x 166910  0.5319693247858127  bfd28e1dee94ea4991ef3fd01a1bff95    not tested  IncFIC,IncFII,IncL/M    CP003035,000111__NZ_CP011595_00114,JN626286 MOBF,MOBP   NC_019125_00139,NC_004464_00056 MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_T,MPF_T,MPF_T   NC_019125_00135,NC_021871_00015,NC_021817_00020,NC_021817_00021,NC_021817_00023,NC_021871_00022,NC_019342_00058,NZ_AGTD01000004_00091,NC_019342_00064,NC_004464_00073,NC_005246_00069,NC_019154_00069,NC_004464_00068,NC_004464_00066,NC_019344_00078,NC_019063_00094,NC_005246_00057,NC_004464_00105,NC_009790_00086,NC_021817_00013,NZ_AGTD01000004_00094 MOBF    KT935446    -   -   -   -   -   -   -

Mob-recon contig_report.txt using reference db (there are no other contigs in this assembly that get assigned to the AA710 cluster):

sample_id   molecule_type   primary_cluster_id  secondary_cluster_id    contig_id   size    gc  md5 circularity_status  rep_type(s) rep_type_accession(s)   relaxase_type(s)    relaxase_type_accession(s)  mpf_type    mpf_type_accession(s)   orit_type(s)    orit_accession(s)   predicted_mobility  mash_nearest_neighbor   mash_neighbor_distance  mash_neighbor_identification    repetitive_dna_id   repetitive_dna_type filtering_reason
assembly    plasmid AA710   AJ005   ctg11_length=166910_depth=1.17x 166910  0.5319693247858127  bfd28e1dee94ea4991ef3fd01a1bff95    not tested  IncFIC,IncFII,IncL/M    CP003035,000111__NZ_CP011595_00114,JN626286 MOBF,MOBP   NC_019125_00139,NC_004464_00056 MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_F,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_I,MPF_T,MPF_T,MPF_T   NC_019125_00135,NC_021871_00015,NC_021817_00020,NC_021817_00021,NC_021817_00023,NC_021871_00022,NC_019342_00058,NZ_AGTD01000004_00091,NC_019342_00064,NC_004464_00073,NC_005246_00069,NC_019154_00069,NC_004464_00068,NC_004464_00066,NC_019344_00078,NC_019063_00094,NC_005246_00057,NC_004464_00105,NC_009790_00086,NC_021817_00013,NZ_AGTD01000004_00094 MOBF    KT935446    -   CP009466    0.0142518   Klebsiella oxytoca  -   -   -

One explanation for this inconsistency is that there is nothing like these contigs in my custom db but there is something similar in the reference db, which is why they get labeled "plasmid" with the reference db. However based on this comment, I would expect these contigs to get classified as plasmids in novel clusters when using my custom db because they are positive for both Rep and MOB (bullet point 3), even if they are not represented in my custom db.

Is this the correct explanation for why this contig is not assigned the "novel plasmid" label in my custom db? Is a match (mash distance between 0.6 and some maximum threshold) to the db required in all circumstances to obtain the "novel plasmid cluster" classification?

jrober84 commented 1 year ago

So I must apologize, there is an error in the logic that I presented. In absence of overlap with the plasmid DB contigs will be labeled as plasmid only if they are circular and contain either a rep or mob marker sequences. So the reason why your plasmid is not classified as such is that it is not overlapping with a plasmid in the blast database.

lerminin commented 1 year ago

Makes sense, thanks for the clarification