tseemann / mlst

:id: Scan contig files against PubMLST typing schemes
GNU General Public License v2.0
192 stars 45 forks source link

Add option to report all matched schemes #95

Open hmontenegro opened 4 years ago

hmontenegro commented 4 years ago

It would be nice to have an option to report all matched schemes above the score, instead of reporting just the best one.

Case in point: for an isolate genome which is not truly an isolate, the output is:

contigs.fa  abaumannii_2    15  Pas_cpn60(6)    Pas_fusA(6) Pas_gltA(8) Pas_pyrG(2) Pas_recA(3) Pas_rplB(5) Pas_rpoB(4)

However, --debug shows:

$VAR1 = [
          'abaumannii_2',
          15,
          '6/6/8/2/3/5/4',
          100
        ];
$VAR2 = [
          'bcereus',
          1280,
          '33/8/13/19/8/17/201',
          100
        ];
$VAR3 = [
          '-',
          '-',
          '-/-/-/-/-/-/-',
          0
        ];

Showing multiple matching schemes would allow quick screening of contaminated isolates. The default for this option would be to report the best match, so no changes to current behaviour.

tseemann commented 4 years ago

What would you expect to happen when multiple genome files were provided?

hmontenegro commented 4 years ago

hmm, didn't think about it. So far, I have used mlst for one genome at a time.

When multiple genome files were provided, I would expect each genome would be processed in turn, and get one or more schemes. So, in a scenario with three genomes, two good isolates and one a bad isolate, calling:

mlst genome1.fa genome2.fa genome3.fa

Would result in:

genome1.fa  abaumannii_2    15  Pas_cpn60(6)    Pas_fusA(6) Pas_gltA(8) Pas_pyrG(2) Pas_recA(3) Pas_rplB(5) Pas_rpoB(4)
genome2.fa  bcereus 1280    glp(33) gmk(8)  ilv(13) pta(19) pur(8)  pyc(17) tpi(201)
genome3.fa  abaumannii_2    15  Pas_cpn60(6)    Pas_fusA(6) Pas_gltA(8) Pas_pyrG(2) Pas_recA(3) Pas_rplB(5) Pas_rpoB(4)
genome3.fa  bcereus 1280    glp(33) gmk(8)  ilv(13) pta(19) pur(8)  pyc(17) tpi(201)