Configuration template for stratocumulus & more
gcloud
command-line tool will generate and use SSH keys for youAnd click to let the magic happen.
To SSH into your machine you can use ssh username@instance-ip
, instance-ip is found on the
information page for your instance after it boots up.
You can also setup a Host
in your ~/.ssh/config
file.
Let's call it GClouder
in the following.
Get to the GClouder:
ssh GClouder
Install a few more things:
sudo apt-get update
sudo apt-get install -y unzip build-essential git
The VM comes with gcloud
installed but this deployment does not have full
capabilities; we need to get a fresh one
(Cf. also https://code.google.com/p/google-cloud-sdk/issues/detail?id=336):
curl https://sdk.cloud.google.com | bash
… saying Yes
to everything … and then, you need to reload your .bashrc
(just
type bash
).
For some reason the zone has to be configured again (replace us-east1-c
with
your favorite part of the world):
gcloud config set compute/zone us-east1-c
Create an SSH key-pair for for gcloud
itself:
gcloud compute ssh $(hostname) ls
and accept the prompts (with an empty password).
We use Git, so that you the user can save their configuration:
git clone https://github.com/smondet/stratotemplate.git
cd stratotemplate
This section creates a functional Ketrew server with Google-Container-Engine.
Edit the file configuration.env
, make sure you're happy with the $PREFIX
and
$TOKEN
values.
Get the script, and run it:
wget https://raw.githubusercontent.com/hammerlab/stratocumulus/master/tools/gcpketrew.sh -O gcpketrew.sh
. configuration.env
sh gcpketrew.sh up
# The first time this may prompt for a `[Y/n]` question.
When the command returns the deployment is partially ready, one needs to ask for the status a few times before the “External IP” is available:
sh gcpketrew.sh status
When it's ready, a little more configuration is required (is this command fails; wait and try again a minute or so later until it succeeds; the container engine may be slow at creating “pods”):
sh gcpketrew.sh configure+local
(Warning: the +local
part will append a line to the
~/.ssh/authorized_keys
file, use configure
if you don't want that).
At any time the status
command will give you the URL of the Ketrew server's
WebUI.
Of course, you can save your changes to the stratotemplate
repository like any
other git repo.
When you want to take the server down (and delete everything related to it):
sh gcpketrew.sh down
We're going to use Docker to get a fully functional OCaml/Opam/Stratocumulus environment.
Get Docker:
sudo apt-get install -y docker.io
Get the image:
sudo docker pull smondet/stratocumulus
Make $PWD
accessible by the container:
chmod -R a+rw .
Get in:
sudo docker run -it -v $PWD:/hostuff/ smondet/stratocumulus bash
Now you're in the right environment to submit stratocumulus deployment jobs.
cd /hostuff
Edit further configuration.env
to set GCLOUD_HOST
, CLUSTER_NODES
… cf.
comments in the file.
. configuration.env
Use the URL provided above by sh gcpketrew.sh status
to create a Ketrew
configuration:
ketrew init --conf ./ketrewdocker/ --just-client $(cat $KETREW_URL)
Create an NFS server with storage:
KETREW_CONFIG=./ketrewdocker/configuration.ml ocaml nfs_server.ml up submit
If you'd like this NFS pool mounted on the cluster you're about to create, you
should edit your configuration.env to add it to the CLUSTER_NFS_MOUNT list;
stratotemplate does not do this automatically for you. Storage is mounted at
/nfs-pool and the witness file is .stratowitness on the newly created servers.
You can find the NFS VM name through the GCloud instance list; it will be
prefixed with the $PREFIX
in your configuration.
Create a compute cluster:
KETREW_CONFIG=./ketrewdocker/configuration.ml ocaml cluster.ml up submit
The 2 above commands submit workflows to the Ketrew server, you can monitor them
with the WebUI (see cat $KETREW_URL
).
Replace up
with down
to take the deployments down ☺
machine
Stratotemplate provides a basic biokepi_machine
for easy
Biokepi.Edsl.Machine.t
creation. This you can #use
in a script to get a
machine
, required for most for Biokepi workflows.
A few environment variables need to be set in order for it to work:
PREFIX
set already in configuration.env
BIOKEPI_WORK_DIR
GATK_JAR_URL
and MUTECT_JAR_URL
URLs to GATK and MuTect (1) JARs;
Biokepi can't automatically download these because of the restrictive
licenses on them.