phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
111 stars 31 forks source link

Smaller number of plasmids and how to build a new database #135

Closed Dx-wmc closed 1 year ago

Dx-wmc commented 1 year ago

Dear Sir,

mob-suite is an excellent tool, but interestingly, when I compared the number of plasmids in mobsuite and PLSDB, I found that mobsuite only includes over 20,000 plasmids while PLSDB has over 30,000. What causes such a large difference in quantity between the two? Did Mobsuite intentionally delete some low-quality plasmids from its database?

Also when I wanted to build a new database, I had a lot of trouble, despite checking some of the contents mentioned in many issues, I think there is still a lack of an effective script to do the job, is there a plan to come up with a detailed tutorial to make a database? Especially the meta information that needs to be entered.

You seem to be very busy at the moment and I hope to receive a reply in your free time.

Yours,

Dave

jrober84 commented 1 year ago

MOB-suite was not meant to be a comprehensive collection of every plasmid, as you have pointed out, there are several excellent examples of efforts designed to hold the diversity of all plasmids. To reduce the size and complexity of the database we will remove near-identical plasmids from the same species by preserving 1 per secondary cluster id. The databases were also build at different times and so will have different numbers of plasmids contained within them. Further, our curation efforts will only include plasmids >= 1500bp and <= 400,000bp. We also actively remove any records where the summary description mentions partial or incomplete, or specifically mentions genes or IS elements. We want to restrict to higher quality "complete" records for inclusion into the database. Also we are cautious about including integrative plasmids as those can complicate the results. The documentation for the cluster module is definitely something we need to improve and is on our list to do for future releases.