oxfordmmm / gnomonicus

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants
Other
5 stars 0 forks source link

edge case where sample with minor alleles can't determine the index of the VCF evidence #31

Closed philipwfowler closed 10 months ago

philipwfowler commented 11 months ago

When processing ~25,000 CRyPTIC samples (all Clockwork v0.12.4), I get 584 failures which result in this Traceback (vcf file attached).

If it is insoluble needs to not fail so catastrophically?

gnomonicus --vcf_file site.05.subj.LR-3205.lab.CR-00878-15.iso.1.v0.12.4.per_sample.vcf --genome_object packages/tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3.gbk --catalogue_file packages/tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3_WHO-UCN-GTB-PCI-2021.7_v1.0_GARC1_RUS.csv --csvs all --json --minor_populations minor_alleles.txt
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1704/1704 [00:00<00:00, 1076207.50it/s]
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/gnomonicus", line 119, in <module>
    variants = populateVariants(vcfStem, options.output_dir, diff, make_variants_csv, options.resistance_genes, catalogue=resistanceCatalogue)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 197, in populateVariants
    variants = pd.concat([variants, minority_population_variants(diff, catalogue, genes)])
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 506, in minority_population_variants
    assert added, f"The index of the VCF evidence could not be determined! {variant_} --> {vcf}"
AssertionError: The index of the VCF evidence could not be determined! 1471932a>g:1.0 --> {'GT': (1, 1), 'DP': 10.0, 'ALLELE_DP': (0.0, 10.0), 'FRS': 1.0, 'COV_TOTAL': 10, 'COV': (0, 10), 'GT_CONF': 58.81, 'GT_CONF_PERCENTILE': 99.89, 'POS': 1471932, 'REF': 'a', 'ALTS': ('g',)}

site.05.subj.LR-3205.lab.CR-00878-15.iso.1.v0.12.4.per_sample.vcf.gz minor_alleles.txt

JeremyWesthead commented 11 months ago

That looks like another odd edge case of 'not actually a minor call but picked up as one due to filter fail'. It has 100% FRS, and looking at the vcf it has a filter fail of MAX_DP which I've not seen before but hopefully shouldn't be too difficult to fix

I think failing quietly (at least not in production) would allow such edge cases to slip through, and in cases like this, it's valuable to be able to spot such 'not really minor call' issues, so I'd be apprehensive to cut out loud failing immediately