We want to make access to this application easy (or at least easier than i currently is)
Lowest friction route is by taking the code we currently have, and running in local-mode (local spark backend instead of distributed use of query-on-batch)
instead of the current initiation of a query-on-batch runtime, we just swap that out for a local initialisation alá here
Issue 1(?)
To run a hail method in local mode, I first have to localise the input data (MT) in order to simulate a local analysis. With our data in GCP this requires using a Google utility tool to make the data available in the job container. I've remedied this by moving back to the Driver image as a base for the AIP image.
Hail Batch can't do this, as read_input/read_input_group methods aren't useful in copying whole directories. Even if they could localise the whole MT directory, the files can't be lifted into a task VM (in this case the AIP runtime) unless they are all individually named/addressed, which is not realistic with Hail data.
Try running Hail Query in local mode
Issue 1(?)
To run a hail method in local mode, I first have to localise the input data (MT) in order to simulate a local analysis. With our data in GCP this requires using a Google utility tool to make the data available in the job container. I've remedied this by moving back to the Driver image as a base for the AIP image.
Hail Batch can't do this, as read_input/read_input_group methods aren't useful in copying whole directories. Even if they could localise the whole MT directory, the files can't be lifted into a task VM (in this case the AIP runtime) unless they are all individually named/addressed, which is not realistic with Hail data.
see the branch https://github.com/populationgenomics/automated-interpretation-pipeline/tree/experiment_with_local_hail