oxfordmmm / gnomonicus

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants
Other
5 stars 0 forks source link

VCF edge case: `TypeError: 'NoneType' object is not subscriptable` #23

Closed philipwfowler closed 1 year ago

philipwfowler commented 1 year ago

This is the only failure case of this type I found in the 4,128 CRyPTIC VCFs I used to test. Get

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/gnomonicus", line 119, in <module>
    variants = populateVariants(vcfStem, options.output_dir, diff, make_variants_csv, options.resistance_genes, catalogue=resistanceCatalogue)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 197, in populateVariants
    variants = pd.concat([variants, minority_population_variants(diff, catalogue, genes)])
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 523, in minority_population_variants
    total_depth = sum(vcf['COV'])
TypeError: 'NoneType' object is not subscriptable

VCF attached and minor pop file attached

site.10.subj.TD02495757.lab.TD02495757.iso.1.v0.12.4.per_sample.vcf.zip minor_alleles.txt

JeremyWesthead commented 1 year ago

This was actually an issue with gumpy caused by the VCF row being split in such a way that the actual call's position didn't match the position of the minor call.

In this case, it was a minor call of a deletion, as well as an actual call of a deletion in a different position: 779116 . CTGCTGGTGTG C,CTGCTGGTGTGTG. The actual call being 779126_del_tg, but the minor call being 779117_del_tgctggtgtg. The VCF evidence was only stored for the position of the actual call.

Added a fix to gumpy to ensure that VCF evidence is also stored for the positions of every minor call too: https://github.com/oxfordmmm/gumpy/commit/8794bfc541fc63b4472edd47ecdc195db5355e59