populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP

MIT License

5 stars 4 forks source link

Fixes to clinvar processing #391

Closed MattWellie closed 3 weeks ago

MattWellie commented 3 weeks ago

Fixes

Closes #389

Proposed Changes

Reimplements https://github.com/populationgenomics/ClinvArbitration/pull/4
At some point it will be worth including that library as a submodule, or running that repo separately, so that this manual harmonisation isn't necessary

Checklist

[x] Related Issue created
[x] Tests covering new change
[x] Linting checks pass

I've checked this manually, and the new output has the alleleID named correctly in the output data

cassimons commented 3 weeks ago

I vote for polishing up the ClinvArbitration automation so it runs after each clinvar release, publishing the output, then stripping all of that logic from here and dogfood our release.

MattWellie commented 3 weeks ago

I vote for polishing up the ClinvArbitration automation so it runs after each clinvar release, publishing the output

Sounds good. ClinvArbitration has been written so that it isn't tailor-made for our infrastructure (it assumes local running, so it runs a local hail instance etc.), so we'd have to run the core process, then do a gcloud copy of the results back into our infrastructure.

That will all be doable inside the standard CPG driver image, so getting that up and running should be fairly simple. The main issue will be VEP annotation (needed for the PM5 evidence table), I don't want to contaminate that repo with the grotty logic we have for annotating the resulting VCF.

cassimons commented 3 weeks ago

That will all be doable inside the standard CPG driver image, so getting that up and running should be fairly simple. The main issue will be VEP annotation (needed for the PM5 evidence table), I don't want to contaminate that repo with the grotty logic we have for annotating the resulting VCF.

Hmm, that is a pain. I agree we do not want VEP in here, but it would be nice if other AIP users could just pull our VEPed version of this file. Perhaps we keep the VEPing internal but also push the output to our release bucket?