katholt / sonneityping

Other
8 stars 2 forks source link

Correct typo and add species to data frame when percent_coverage < 90 #7

Closed sage-wright closed 1 year ago

sage-wright commented 1 year ago

Hello!

I came across a Python parsing error when running parse_mykrobe_predict.py on a S. Sonnei sample with less than 90% coverage.

The particular error was as follows:

final_results.to_csv(args.prefix + "_predictResults.tsv", index=False, sep="\t", columns=["genome", "species", "final genotype", "name", "confidence", "num QRDR", "parC_S80I", "gyrA_S83L", "gyrA_S83A", "gyrA_D87G", "gyrA_D87N", "gyrA_D87Y", "lowest support for genotype marker", "poorly supported markers", "max support for additional markers", "additional markers", "node support"])
... [more python traceback messages] ...
KeyError: "['species', 'lowest support for genotype marker'] not in index"

I was able to track this down to line 237: https://github.com/katholt/sonneityping/blob/b4ba875257b63f7544eeac037f921914ee238704/parse_mykrobe_predict.py#L237 where 'lowest support for genotype marker' is misspelt as 'lowest supprot'. In addition, 'species' is not included as a column in the data frame which causes the second additional parsing error.

I also noticed that a similar parsing error would occur in line 229 since 'species' is also not included as a column in the data frame: https://github.com/katholt/sonneityping/blob/b4ba875257b63f7544eeac037f921914ee238704/parse_mykrobe_predict.py#L229

After making these small changes, I was able to prevent the Python parsing error. This will prevent script failure when the percent_coverage is under 90 and if spp_call is "Unknown".

Thank you!