populationgenomics / talos

Rare Disease variant reanalysis tool
MIT License
6 stars 4 forks source link

Fixes to clinvar processing #391

Closed MattWellie closed 6 months ago

MattWellie commented 6 months ago

Fixes

Proposed Changes

Checklist

I've checked this manually, and the new output has the alleleID named correctly in the output data

cassimons commented 6 months ago

I vote for polishing up the ClinvArbitration automation so it runs after each clinvar release, publishing the output, then stripping all of that logic from here and dogfood our release.

MattWellie commented 6 months ago

I vote for polishing up the ClinvArbitration automation so it runs after each clinvar release, publishing the output

Sounds good. ClinvArbitration has been written so that it isn't tailor-made for our infrastructure (it assumes local running, so it runs a local hail instance etc.), so we'd have to run the core process, then do a gcloud copy of the results back into our infrastructure.

That will all be doable inside the standard CPG driver image, so getting that up and running should be fairly simple. The main issue will be VEP annotation (needed for the PM5 evidence table), I don't want to contaminate that repo with the grotty logic we have for annotating the resulting VCF.

cassimons commented 6 months ago

That will all be doable inside the standard CPG driver image, so getting that up and running should be fairly simple. The main issue will be VEP annotation (needed for the PM5 evidence table), I don't want to contaminate that repo with the grotty logic we have for annotating the resulting VCF.

Hmm, that is a pain. I agree we do not want VEP in here, but it would be nice if other AIP users could just pull our VEPed version of this file. Perhaps we keep the VEPing internal but also push the output to our release bucket?