Mt to es outside dataproc

MattWellie commented 4 months ago

Closes #800

~Untested~

We are currently locked into using the seqr-loading-pipelines code as a submodule, and DataProc as an execution environment, when building ElasticSearch indexes for Seqr
This change takes all the methods and process that we use from seqr-loading-pipelines, skips all the content which is never triggered when we run an MTtoES transition. Once we remove all the code paths we never intend to execute, what's left is a fairly thin wrapper around the elasticsearch library and a few Hail Methods

Process:

copy the target MT into the VM
start a Hail local Spark instance
generate the ES password from secrets in a config file - this removes the need to pass a secret in plain text
Uses the same MT -> flattened HT method as the existing script
Creates an ElasticSearchClient, modelled on the Hail version. This contains all the method calls we previously executed, instead of importing those methods from seqr-loading-pipelines
Removes the ES Index if it already exists (unexpected)
Pushes a new index by name to the ES instance, cleans up, and writes a 'DONE' file

This is complete theft:

where we were importing/inheriting from the HailES Client, I've created an equivalent client with the same functionality (limited to only the methods/calls we actually execute)
where we referenced constants in the SLP repo, or call methods which require constants, I've copied them in
where SLP methods called out to Hail methods, we're using those Hail methods directly
there's no doubt a need to more explicitly credit the client/code I'm ripping off here

MattWellie commented 4 months ago

This run uses a different approach:

make a big VM (configurable)
gcloud cp the MatrixTable into the VM
process the localised MT into an ES index using a local hail instance with a spark backend
?????
PROFIT!

This test run was successful, but it was successful with a teeny weeny baby MT (~2MB total). Just a proof of concept, but a success.

MattWellie commented 4 months ago

Closing as superseded by #829