populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP
MIT License
5 stars 4 forks source link

Localise the ClinVar-Residues Hail Table? #264

Closed MattWellie closed 1 year ago

MattWellie commented 1 year ago

See https://batch.hail.populationgenomics.org.au/batches/422683/jobs/2

There's a potentially massive number of requests on this common resource due to the nature of the queries, joins, and general reshuffling of the data.

Probably a good idea to either localise this data, or checkpoint after the join to reduce network traffic. It's only a small table, so writing into each batch is fine

Case: 2 runs were using this central table, then one of the runs was restarted. That burden was enough to crash the run on the basis of overloading the GCP object bandwidth cap

MattWellie commented 1 year ago

Note - all reports have been run, and this has not happened again since. Deprioritised, maybe cancelled