phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
111 stars 31 forks source link

same primary_cluster_id but different mash_nearest_neighbour? #122

Closed karynkomatsu closed 1 year ago

karynkomatsu commented 1 year ago

Hi, I noticed that mobtyper_results (from MOB RECON) sometimes show rows with same primary_cluster_id, but different mash_nearest_neighbour. If the primary MOB-cluster id of two plasmid contigs are the same, why would their accession ID of closest plasmid match (aka mash_nearest_neighbour) be different? If they have same cluster id, shouldn't their closest plasmid match also be identical?

Thank you for all your help in advance!

image

jrober84 commented 1 year ago

So the MOB-cluster identifiers indicated membership to a "cluster" which consists of 1 or more members. Sequences are assigned to clusters based on the lowest mash distance in the reference database. So the mash nearest neighbor is telling you what sequence in the reference database has the lowest mash distance. The cluster associated with that closest match is what assigns your query sequence to a MOB-cluster. So if your sequences were 100% identical then their mash nearest neighbour would be the same, but if there is any differences then it is possible for the sequences to have different mash nearest neighbors.