oxfordmmm / piezo

Other
2 stars 0 forks source link

add logic so large deletions return a result #11

Open philipwfowler opened 1 year ago

philipwfowler commented 1 year ago

At present, very large deletions (think set at greater than 50% of a gene) e.g. pncA@del_0.89 do not hit any rules in a catalogue which causes piezo to crash as per below.

We therefore need some default rules so that a large deletion in a resistance gene can return a U (equivalent to pncA@indel_* for smaller deletions) as well as specific rules with a % min threshold above which the rule is triggered (something like pncA@del_>=0.5, R. Hence, as usual a specific rule can override a default rule.

In the longer term we might need to think about how we harmonise indels across the length scales but that feels hard for now. site.05.subj.PMOP-0621.lab.MOP-184.iso.1.v0.12.4.per_sample.vcf.gz minor_alleles.txt

$ gnomonicus --vcf_file site.05.subj.PMOP-0621.lab.MOP-184.iso.1.v0.12.4.per_sample.vcf --genome_object packages/tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3.gbk --catalogue_file packages/tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3_WHO-UCN-GTB-PCI-2021.7_v1.0_GARC1_RUS.csv --csvs all --json --minor_populations minor_alleles.txt
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1064/1064 [00:00<00:00, 870269.00it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 687/687 [01:13<00:00,  9.36it/s]
  2%|█▉                                                                                                     | 17/883 [00:00<00:00, 6095.33it/s]
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/gnomonicus", line 131, in <module>
    effects, metadata = populateEffects(options.output_dir, resistanceCatalogue, mutations, referenceGenes, vcfStem, make_effects_csv, make_prediction_csv)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/gnomonicus/gnomonicus_lib.py", line 868, in populateEffects
    prediction = resistanceCatalogue.predict(gene+'@'+mutation)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/piezo/catalogue.py", line 55, in predict
    return predict(self.catalogue, mutation=mutation, verbose=verbose)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/piezo/catalogue.py", line 161, in predict
    return predict_GARC1(catalogue, mutation)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/piezo/grammar_GARC1.py", line 239, in predict_GARC1
    raise ValueError("No entry found in the catalogue for "+gene_mutation+" "+compound)
ValueError: No entry found in the catalogue for pncA@del_0.89 PZA
JeremyWesthead commented 1 year ago

The way I currently have large dels implemented in piezo is that it will only hit del_x.y if the mutation given is of the same format. So it may be best to have the default rules on del_0.0, and re-prioritise so that's treated as a default rule similar to the * rules.

Possibly worth grouping together with the piezo changes required to flow through the evidence

philipwfowler commented 1 year ago

Agreed, using a lower-bound rule as the effective default makes sense e.g. pncA@del_0.1, U. Might specify a the threshold/just below the threshold for calling large deletions to make clear that ones with fewer deletions are handled differently.

Could we please keep this and the evidence separate (unless the latter isn't much work) since I need to process all of CRyPTIC Release Two and this fixing this Issue will unblock that.