phac-nml / mob-suite

MOB-suite: Software tools for clustering, reconstruction and typing of plasmids from draft assemblies
Apache License 2.0
111 stars 31 forks source link

Coverage threshold not applied for tblastn searches #145

Closed CorinYeatsCGPS closed 11 months ago

CorinYeatsCGPS commented 12 months ago

Hi, I noticed that when I did a tblastn search using the MOB family representatives directly, I didn't get exactly the same set of hits as when running MOB suite (e.g. mobtyper). I noticed that in the MOBsuite tblastn method the coverage threshold is not passed through to BLAST, and I've just noticed it's not applied to the blastn calls either. Is this intentional, and coverage thresholds are no longer used, or is the threshold applied elsewhere? I've also not dug any deeper into determining this is definitely the cause of the differences, so apologies if I've got this wrong.

https://github.com/phac-nml/mob-suite/blob/2231520e66bd5dd4e1805e5ec3697ad3b069aff7/mob_suite/blast/__init__.py#L69C82-L69C83

Otherwise, great tool, thanks for providing it,

Corin

jrober84 commented 11 months ago

Hello,

The code in this area definitely can use an improvement for readability. The min_cov and evalue is handled elsewhere in the code where these are applied post obtaining the blast results.

[https://github.com/phac-nml/mob-suite/blob/2231520e66bd5dd4e1805e5ec3697ad3b069aff7/mob_suite/utils.py#L338]

tblastn can very sensitive to version of blast and other parameters, so I would recommend checking that first.

Hope that helps.

CorinYeatsCGPS commented 11 months ago

Thanks @jrober84 for the link. I can see I missed a couple of parameter tweaks (e.g. min_qcovhsp), which will probably explain most things.