Results not matching CGE ResFinder web-interface output

eam12 commented 5 years ago

I did a quick comparison between the staramr output and the output from the CGE's ResFinder web interface. For the most part things matched, but I did find one issue. On the CGE site, I got a hit to aadA1 at bps 22917-23708 with 99.75%. With staramr, ant(3'')-Ia_X02340 was identified in the same region, but extending an extra 176 bps (22917-23884 bp), with 99.49% identity. There appears to something different about the way starmar and CGE are "choosing" their top hit for a given open reading frame. Can anyone give me any insight into this? Percent identity threshold was set at 90% for both tools and I made sure that the same ResFinder database (2019-01-29) was used for both.

apetkau commented 5 years ago

Thanks for reporting the issue @eam12 :smile:

You are correct, there is a difference in how the CGE web server is choosing the gene to report, and the way staramr is choosing the gene. There are a lot of very similar genes in the ResFinder database, and the logic I implemented tries to choose the "best" result based on the percent identity and gene length. staramr prefers to report longer genes so long as they don't drop the percent identity too much. I am not quite sure what is the "best" gene to pick in these cases where we have very close matches, but this is the logic staramr is currently using. You can take a look at it here https://github.com/phac-nml/staramr/blob/master/staramr/blast/results/BlastResultsParser.py#L110 if you're curious.

staramr does support the option to include all genes in your results if you wanted to do more in-depth inspections. You can run with --report-all-blast in this case. I'm sure if you use this flag you'll see the gene the CGE webserver picked among the list of possible options.

Let me know if you want any more information.

eam12 commented 5 years ago

This was very helpful. Thank you very much!

phac-nml / staramr

Results not matching CGE ResFinder web-interface output #44