populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP
MIT License
5 stars 4 forks source link

Shift config from JSON to Toml #128

Closed MattWellie closed 1 year ago

MattWellie commented 1 year ago

Currently config uses a manually passed JSON file, which adds a lot of coupling in method calls.

Instead of that, use a TOML file, and use the CPG-utils config functionality to absorb the config content into the broader environment config.

Reference: https://centrepopgen.slack.com/archives/C018KFBCR1C/p1665451208949389

Increments:

  1. Split current whole-run config examples into 2 sub-configs
    • AIP-general purpose (thresholds, filter settings, report colours)
    • Cohort-specific (e.g. sample lookups, seqr project...)
  2. convert JSON configuration files to TOML
  3. update analysis-runner call to pass the general and cohort-specific configs as arguments
  4. change ALL config usage to reference the global config using get_config

Note: This breaks up the config changes into 4 separate steps, but step 4 is obviously 99% of the effort and 100% of the code changes... I can't see an easy way to mitigate that, unless both config files are supplied and different parts of the codebase use different approaches, which seems silly

MattWellie commented 1 year ago

Update: Following a re-run of the ag-hidden cohort, some variants have changed in the final report

PREVIOUS

LATEST

Gained: A***017 Cat. 2 FRMD5 var.

Lost: A***021 Cat. 1 GALT var.

There are different panel versions, but that probably won't explain this difference

MattWellie commented 1 year ago

Ok... Maybe it's absolutely fine. The older version of the Mendeliome (by 6 days) has no FRMD5 gene entry, and GALT is Monoallelic. In the latest version FRMD5 is present, and GALT is biallelic. This explains both discrepancies