oxfordmmm / gnomonicus

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants
Other
5 stars 0 forks source link

fails to parse detected minor population mutant #11

Closed philipwfowler closed 1 year ago

philipwfowler commented 1 year ago

If I pass a (i) VCF file with a minor population e.g. a row like

NC_000962.3 7570    .   C   T   .   MIN_GCP .   GT:DP:ALLELE_DP:FRS:COV_TOTAL:COV:GT_CONF:GT_CONF_PERCENTILE    0/0:100:95,5:1.0000:100:95,5:31.81:11.47

and call gnomonicus with a catalogue with an appropriate row e.g.

NC_000962.3,test_001,v1.00,GARC1,RFUS,MXF,gyrA@A90V:0.1,R,{},{},{}

and supply a text file to --minor_populations containing 7570 then it complains as

$ gnomonicus --vcf_file vcf-files/NC_000962_3_test_0002.vcf --catalogue_file catalogues/NC_000962.3_test_001.csv --genome_object reference/NC_000962.3.gbk.pkl.gz --json --minor_populations vcf-files/NC_000962_3_test_0002.minor_populations.txt
percentage
['A90V:0.05']
Traceback (most recent call last):
  File "/Users/fowler/Library/Python/3.10/bin/gnomonicus", line 94, in <module>
    mutations, referenceGenes = populateMutations(vcfStem, options.output_dir, diff,
  File "/Users/fowler/Library/Python/3.10/lib/python/site-packages/gnomonicus/gnomonicus.py", line 330, in populateMutations
    mutations = pd.concat([mutations, minority_population_mutations(diffs, resistanceCatalogue)])
  File "/Users/fowler/Library/Python/3.10/lib/python/site-packages/gnomonicus/gnomonicus.py", line 439, in minority_population_mutations
    muts = [mut.split("@")[1].split(":")[0] for mut in mutations]
  File "/Users/fowler/Library/Python/3.10/lib/python/site-packages/gnomonicus/gnomonicus.py", line 439, in <listcomp>
    muts = [mut.split("@")[1].split(":")[0] for mut in mutations]
IndexError: list index out of range

I put two print commands in and it seems that diff.minor_populations is returning A90V:5 rather than the expected gyrA@A90V. This is therefore probably a gumpy bug but I am recording here. test_002.tgz

JeremyWesthead commented 1 year ago

Is this on the main branch or large-deletions?

philipwfowler commented 1 year ago

main for both

philipwfowler commented 1 year ago

This is also cropping up in the foo.mutations.csv table where MUTATION is e.g. S450S rather than rpoB@S450S or having gene separately specified

JeremyWesthead commented 1 year ago

With the latest main branch of both gumpy and gnomonicus, I can't reproduce either issue...

In effects.csv I get NC_000962_3_test_0002,gyrA,A90V:0.05,test_001,MXF,U which matches the 5% FRS in the VCF, which is <10% required for R in the catalogue, so is a default rule match

In mutations.csv, the gene is specified by the GENE column - the columns are in a different order (not sure why) but the GENE column definitely still exists. For my test, it was col 18

philipwfowler commented 1 year ago

Having merged in large-deletions this works as expected now, thanks. Closing.