googlegenomics / variant-annotation

Use cloud technology to annotate human sequence variants in parallel.
Apache License 2.0
11 stars 7 forks source link

Annotation as a Service #5

Open Jessime opened 6 years ago

Jessime commented 6 years ago

Currently, we only parallelize VEP as far as bringing up a VM per VCF file, running VEP on each file, and storing newly annotated files back on disk. An attractive option for parallelizing VEP is to host instances of a server listening for annotation requests. Several of these components are flexible, but the stack might look something like:

App Engine → Docker → Flask server → VEP

There are several attractive aspects of this parallelization option:

  1. It's relatively straightforward; not much code needs to be written.
  2. Since it processes each variant individually, it can be integrated into the Dataflow pipeline of VT.
  3. In the long term, the docker container could contain multiple annotation programs, and users could dictate which of them to select as part of their request to the flask server.
bashir2 commented 5 years ago

A prototype of this idea is implemented in PR #7.