oxfordmmm / gnomonicus

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants
Other
4 stars 0 forks source link

slightly different edge case where "index of the VCF evidence could not be determined!" #36

Open philipwfowler opened 1 year ago

philipwfowler commented 1 year ago

At first glance it looks like one of the other Issues, but this one has no MAX_DP filter fail but looking in the VCF does have some peculiar floats for ALLELE_DP e.g. the INFO block at position 4326907 is

1/1:135:1.8814,135:0.8544:158:0,134:711.33:54.03

yet this is marked as MIN_FRS because FRS=0.8544 but it isn't obvious how that is calculated from either the ALLELE_DP or the COV!

site.04.subj.04897.lab.933331.iso.1.v0.12.4.per_sample.vcf.gz

Estimate affects 23 samples out of 44k.

Note: most of these seem to be in the region of 4326900 (as is the example here) which is in ethA and therefore in the minor_alleles.txt. If I remove the genome indices corresponding to ethA, the VCF file processes. If this is insoluble/hard, that might be the answer, however there are many rows in the catalogue that associate mutations in ethA with resistance to ETH.

$ gnomonicus --genome_object packages/tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3.gbk --catalogue_file packages/tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3_WHO-UCN-GTB-PCI-2021.7_v1.0_GARC1_RUS.csv --csvs all --json --minor_populations minor_alleles.txt --vcf_file /mnt/data/cryptic-release-two/dat/CRyPTIC2/V2/04/04897/933331/1/per_sample/site.04.subj.04897.lab.933331.iso.1.v0.12.4.per_sample.vcf
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3325/3325 [00:00<00:00, 1111505.60it/s]
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/gnomonicus", line 121, in <module>
    variants = populateVariants(vcfStem, options.output_dir, diff, make_variants_csv, options.resistance_genes, catalogue=resistanceCatalogue)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 200, in populateVariants
    variants = pd.concat([variants, minority_population_variants(diff, catalogue, genes)])
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 550, in minority_population_variants
    assert added, f"The index of the VCF evidence could not be determined! {variant_} --> {vcf}"
AssertionError: The index of the VCF evidence could not be determined! 4326908_del_gtgaccgccgttgcgccactgccgatcacgacgatgttcttagcgtcgtagtcgaggtcctcgggccagtgctgcggatggatgatcggcccgacgaaatcctccgagccggcgaatctcggcgagtagccctcgtcgtagttgtagtagccgctgcacagaaagaggaattcgcaggtgagggcgctgagcgtgccgtggctttggatgtgaacggtccagcggttttccgcggtcgaccaatcggcactgatcaccttgtggtggaaccggatatgcctgtcgattccatacatggccgcggtgctcttgacgtactcgaggatgggcttgccgtcggcgatcgcctgccgtccggtccagggacggaatcggaaacctagcgtgtacatgtcggagtcggagcgaattccgggataacggaacaaatcccaggtgccgcccatggattcccgcttttccaggatggcgtagctcttggtcgggcaacggtcctgcaggtgccaggccgcgctgacaccggagattccagcgcccacgatgacaacgtcgaggtgctcggtcatggatccacgctatcaacgtaatgtcgaggccgtcaacgagatgtcgacactatcgacacgtagtaagctgccagggtgaccacctccgcggccagtcaggcttcgctgcctaggggccggcgcaccgcgcggccgtccggcgacgatcgtgaactggcgatcctcgccaccgccgagaaccttctcgaggaccgtccgctggccgatatctcggtcgacgatctggccaagggcgccggtatctcgaggccgacgttctacttctatttcccatccaaggaagcggtgctgctgaccctgctggaccgggtggtcaatcaagccgacatggccctacagacccttgccgagaatcccgccgacaccgaccgcgagaacatgtggcgcaccgggatcaacgtgttcttcgagacattcgggtcgcacaaggcggtaacccgagccggtcaggccgccagggcaaccagtgtcgaagtcgccgaactgtggtcgacgtttatgcagaagtggatcgcctacacggccgccgtgatcgacgccgaacgcgaccgaggcgcggcgccgcgcaccctgccggcccatgaactggccacagcgctcaacctgatgaacgagcggacgctgttcgcgtcattcgccggcgaacagccctcggtgccggaagcccgcgtgctggatacgctggtgcacatctgggtgaccagcatttacggcgagaaccgctaagccgcactcggtcgggggtgctcggtcgatgctcagtgccaaagcggcatgcagatctcacggaggtccggtggacgatctggcagccgaagtggcgccttgggtaggcaatggcgtgcggtcatataggagcgggtgcattcgcatgtcggacacgtggcgttgccgcctggtaccgcggtgttcgtggccgacagcgggctaatgcgacccggtccacgccaggagcgtgtcggccggccaggtgttgacgatccggtcggcgggcacctccgcgtccaaggcgcgctgggcgccgtagccgaggaagtccagct:1.0 --> {'GT': (1, 1), 'DP': 135.0, 'ALLELE_DP': (1.8813999891281128, 135.0), 'FRS': 0.854, 'COV_TOTAL': 158, 'COV': (0, 134), 'GT_CONF': 711.33, 'GT_CONF_PERCENTILE': 54.03, 'POS': 4326907, 'REF': 'cgtgaccgccgttgcgccactgccgatcacgacgatgttcttagcgtcgtagtcgaggtcctcgggccagtgctgcggatggatgatcggcccgacgaaatcctccgagccggcgaatctcggcgagtagccctcgtcgtagttgtagtagccgctgcacagaaagaggaattcgcaggtgagggcgctgagcgtgccgtggctttggatgtgaacggtccagcggttttccgcggtcgaccaatcggcactgatcaccttgtggtggaaccggatatgcctgtcgattccatacatggccgcggtgctcttgacgtactcgaggatgggcttgccgtcggcgatcgcctgccgtccggtccagggacggaatcggaaacctagcgtgtacatgtcggagtcggagcgaattccgggataacggaacaaatcccaggtgccgcccatggattcccgcttttccaggatggcgtagctcttggtcgggcaacggtcctgcaggtgccaggccgcgctgacaccggagattccagcgcccacgatgacaacgtcgaggtgctcggtcatggatccacgctatcaacgtaatgtcgaggccgtcaacgagatgtcgacactatcgacacgtagtaagctgccagggtgaccacctccgcggccagtcaggcttcgctgcctaggggccggcgcaccgcgcggccgtccggcgacgatcgtgaactggcgatcctcgccaccgccgagaaccttctcgaggaccgtccgctggccgatatctcggtcgacgatctggccaagggcgccggtatctcgaggccgacgttctacttctatttcccatccaaggaagcggtgctgctgaccctgctggaccgggtggtcaatcaagccgacatggccctacagacccttgccgagaatcccgccgacaccgaccgcgagaacatgtggcgcaccgggatcaacgtgttcttcgagacattcgggtcgcacaaggcggtaacccgagccggtcaggccgccagggcaaccagtgtcgaagtcgccgaactgtggtcgacgtttatgcagaagtggatcgcctacacggccgccgtgatcgacgccgaacgcgaccgaggcgcggcgccgcgcaccctgccggcccatgaactggccacagcgctcaacctgatgaacgagcggacgctgttcgcgtcattcgccggcgaacagccctcggtgccggaagcccgcgtgctggatacgctggtgcacatctgggtgaccagcatttacggcgagaaccgctaagccgcactcggtcgggggtgctcggtcgatgctcagtgccaaagcggcatgcagatctcacggaggtccggtggacgatctggcagccgaagtggcgccttgggtaggcaatggcgtgcggtcatataggagcgggtgcattcgcatgtcggacacgtggcgttgccgcctggtaccgcggtgttcgtggccgacagcgggctaatgcgacccggtccacgccaggagcgtgtcggccggccaggtgttgacgatccggtcggcgggcacctccgcgtccaaggcgcgctgggcgccgtagccgaggaagtccagct', 'ALTS': ('c',)}
JeremyWesthead commented 1 year ago

That looks like another case of large deletions in minor populations, but it also doesn't look like it should have been a minor population anyway. The example above deletes all of ethA + a bit more, but is given as a minor population with FRS of 1.0, so may require some digging I'd suggest a quick fix could be just setting vcf evidence to None in these situations, because as you said this seems like an odd VCF case. Possibly worth opening an issue in minos/clockwork?

philipwfowler commented 1 year ago

Agreed, could we go with the quick fix i.e. report None. Thanks.