13000 lines are removing a JSON file for the pre-panelapp mendeliome (genes in the mendeliome prior to migration into panelapp, and the panels they overlap with). Using this in a Day 1 analysis makes Cat2 woefully noisy, so we would rather the first analysis is done with no Cat2, and results are incremental from there
~1000 lines are the removal of the comparison scripts. I still have them stashed somewhere, but the implementation as it stands is broken (and not necessary post-validation?)
~1000 lines are removal of the ClinVar content - tests, fixtures, CI yaml
Most other changes are removal of per-cohort config handling, removed in favour of generating one config prior to running, and using the new config fields
Fixes
This is still tough to deploy in a new setting, mostly around config where paths, values, are pulled from various parts of config at runtime. Building a new config from scratch is not intuitive.
The current cpg_utils.config module uses the environment variable CPG_CONFIG_PATH, which ties this software to the CPG naming convention
A load of files retained here are no longer used:
ClinVar content migrated to ClinvArbitration
Comparison module is no longer functional with recent changes
Pre-PanelApp version of the Mendeliome was VCGS-specific, and no longer used
Cohort-specific config file is not relevant
Tests for the above
Proposed Changes
Steals the cpg_utils config framework, using an environment variable TALOS_CONFIG, or taking a file path. This uses a global config which can be referenced indirectly instead of being passed.
We want to provide a bespoke, minimal config to Talos jobs, instead of the sprawling CPG config.
I want to remove the CPG naming convention from the config path (CPG_CONFIG_PATH env variable)
The final action before a batch is run is to call copy_common_env, which sets the value of CPG_CONFIG_PATH inside the job to the value in the Driver image. This would overwrite even if it was set previously, so keeping the cpg-utils standard config module but providing a new config file is tricky in our hail batch deployment.
Uses a single config file, with separate sections coded to each
Aligns the names for the main scripts, the config sections, the command-line entry points, and the production-pipelines Stage names
Combines with a prod-pipes PR which builds a config file in the format expected here
Deletes a TON of files - the whole comparison module is woefully out of sync with current structures and doesn't work, the pre-panelapp file is not used anywhere, the clinvar content that's been migrated to ClinvArbitration is removed from here (along with relevant CI workflow)
Used end-to-end here: https://batch.hail.populationgenomics.org.au/batches/458319 Twinned with https://github.com/populationgenomics/production-pipelines/pull/810
n.b. this PR isn't as scary as it looks:
Fixes
Proposed Changes
Steals the cpg_utils config framework, using an environment variable
TALOS_CONFIG
, or taking a file path. This uses a global config which can be referenced indirectly instead of being passed.CPG_CONFIG_PATH
env variable)CPG_CONFIG_PATH
inside the job to the value in the Driver image. This would overwrite even if it was set previously, so keeping the cpg-utils standard config module but providing a new config file is tricky in our hail batch deployment.Uses a single config file, with separate sections coded to each
Aligns the names for the main scripts, the config sections, the command-line entry points, and the production-pipelines Stage names
Combines with a prod-pipes PR which builds a config file in the format expected here
Deletes a TON of files - the whole comparison module is woefully out of sync with current structures and doesn't work, the pre-panelapp file is not used anywhere, the clinvar content that's been migrated to ClinvArbitration is removed from here (along with relevant CI workflow)
Checklist