phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
124 stars 33 forks source link

Feature request: Support `circular=Y` for detecting circularized contigs #151

Open dfornika opened 1 year ago

dfornika commented 1 year ago

We've been using dragonflye for some hybrid assemblies recently and have noticed a few unexpected results when running mob-recon on the assemblies. In one example, mob-recon included a small (~2kb) contig into the chromosome, despite the assembly including a large circular contig that appeared to be a complete chromosome. It seemed that mob-recon should have recognized the chromosome as being complete/circularized, so it wouldn't make sense to include other contigs into the chromosome reconstruction.

I suspect the reason this is happening is that mob-recon looks for a unicycler-style circularization tag in the fasta headers, which looks like: circular=true. But dragonflye (and presumably flye itself, though I haven't confirmed) add a tag that looks like: circular=Y.

Am I understanding how mob-recon detects circular contigs correctly? Would it be straightforward to support both circular=true and circular=Y? If this isn't straightforward to do in mob-recon then we could make a little sed command that converts circular=Y to circular=true. But other users would need to do the same thing in all of their codebases too, so it seems like a less efficient solution.