populationgenomics / automated-interpretation-pipeline

Rare Disease variant prioritisation MVP
MIT License
5 stars 4 forks source link

'Local' AIP in Hail #385

Open MattWellie opened 2 weeks ago

MattWellie commented 2 weeks ago

Try running Hail Query in local mode

Issue 1(?)

To run a hail method in local mode, I first have to localise the input data (MT) in order to simulate a local analysis. With our data in GCP this requires using a Google utility tool to make the data available in the job container. I've remedied this by moving back to the Driver image as a base for the AIP image.

Hail Batch can't do this, as read_input/read_input_group methods aren't useful in copying whole directories. Even if they could localise the whole MT directory, the files can't be lifted into a task VM (in this case the AIP runtime) unless they are all individually named/addressed, which is not realistic with Hail data.

see the branch https://github.com/populationgenomics/automated-interpretation-pipeline/tree/experiment_with_local_hail