oxfordmmm / gnomonicus

Python code to integrate results of tb-pipeline and provide an antibiogram, mutations and variants
Other
5 stars 0 forks source link

Fix to ensure that multis aren't erroneously duplicated #38

Closed JeremyWesthead closed 9 months ago

JeremyWesthead commented 10 months ago

If a catalogue has the same multi mutation prediction for multiple drugs, this mutation would be duplicated for each drug. This lead to the outputs producing duplicate entries in the JSON for these mutations. Example JSON snippet:

{
          "gene": null,
          "mutation": "fabG1@I109I&fabG1@c327t",
          "prediction": "U",
          "evidence": {}
        },
        {
          "gene": null,
          "mutation": "fabG1@F72F&fabG1@c216t",
          "prediction": "U",
          "evidence": {}
        },
        {
          "gene": null,
          "mutation": "fabG1@I109I&fabG1@c327t",
          "prediction": "U",
          "evidence": {}
        },
        {
          "gene": null,
          "mutation": "fabG1@F72F&fabG1@c216t",
          "prediction": "U",
          "evidence": {}
        },

This should simply fix this by treating the multi mutations from the catalogue as a set before fetching them from the genome