Simple Parallelization Strategy

rmcewan commented 6 years ago

Look into scale out approaches to increase throughput. Options include: multiple machines; UIMA-AS (prob not, b/c MetaMap and CLAMP) are too opaque; batching of documents and allocating to multiple machines

GregSilverman commented 6 years ago

This is dependent on the outcome of #4 ...

GregSilverman commented 6 years ago

@rmcewan, this project looks like it will meet requirements as per our conversation about parallelization strategies: https://github.com/docker/swarm also see https://docs.docker.com/engine/swarm/swarm-tutorial/

rmcewan commented 6 years ago

@GregSilverman Looks promising - we should meet to flesh out compatibility between direction of this effort and existing/new NLP-PIER architecture extensions.

GregSilverman commented 6 years ago

@rmcewan: first step in swarm is to look into job control flow. In particular, need to control when containers are created as dependencies on when a particular job is complete; e.g., when annotator engines are done processing notes, then ES container creation and indexing can begin, etc.

GregSilverman commented 6 years ago

@rmcewan: We may need to use Kubernetes for clustering, since it, unlike Docker Swarm has a well-maintained job control module. See https://github.com/argoproj/argo for details. The alternative would be to use this for Swarm: http://dray.it or Jenkins to manage the workflow. http://dray.it does not seem to be well maintained, and Jenkins is not an optimal solution for this. However, Kuberentes can utilize all the work I've done in Docker, through use of this tool: https://github.com/kubernetes/kompose... along similar lines, it seems that Docker may soon become Kubernetes, especially since the bleeding edge OS X version now supports it: https://docs.docker.com/docker-for-mac/kubernetes/

Please take a look at these resources. I will do more exploration with this and may actually try to create a simple workflow between an annotator engine and elasticSearch (that is, once the output has been generated from the annotator engine then I will kick off the elastic container process).

GregSilverman commented 6 years ago

@rmcewan: After an initial struggle, I got a local minikube cluster successfully working using ElasticSearch and NLP-TAB. I will work next on adding Amicus and BioMedICUS as nodes to the cluster. Once this is done, I will add in container job scheduling per the https://github.com/argoproj/argo project.

In summary: Kubernetes is the clustering solution we will deploy with our Docker images.

GregSilverman commented 6 years ago

Brain dump of setting up a minikube local cluster config for sharing local host directory:

This assumes that Docker images are either already available locally, or through the Docker.io registry (NB: to connect to docker.io, set environment variable, export DOCKER_CONFIG=~/.docker/; then there is a workaround to get functioning with Mac OS: modify config.json with credentials, copy to ~/.docker/config.json, and then from CLI, do a docker login; you can then do a kompose up --build local to build image and push to registry, or else you can build locally using docker build -t image_name . followed by a docker push image_name - if you want to push to the registry, otherwise, you can just use kompose up --build none to grab the image that you just built using the docker command)

Issue following commands: (we do not suggest using the kompose tool; the above is just a method to create Docker images and for illustrative purposes)

Build Docker image

docker build -t image_name .

Start minikube VM

minikube start

Mount local drive to VM

minikube mount $HOME/development/test_data:/data

In new terminal:

Create pod deployment

kubectl create -f test.yml

Expose as service

kubectl expose deployment es --type=NodePort

Get service details:

kubectl describe service es

Get service endpoint in browser

minikube service es

Start Minikube dashboard

minikube dashboard

SSH to minikube

minikube ssh

SSH to deployed pod

kubectl exec -it es-75457dd555-nc5vk -- /bin/bash

Sources:

Basics on minikube and compose

https://kubernetes.io/docs/tools/kompose/user-guide/ https://medium.com/@claudiopro/getting-started-with-kubernetes-via-minikube-ada8c7a29620 https://kubernetes.io/docs/tutorials/stateless-application/hello-minikube/

Exposing a pod/deployment as a service

https://stackoverflow.com/questions/45523220/minikube-service-servicename-url-return-nothing https://kubernetes.io/docs/tasks/access-application-cluster/connecting-frontend-backend/#creating-the-backend-using-a-deployment https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ https://www.stratoscale.com/blog/kubernetes/kubernetes-exposing-pods-service/ http://rafabene.com/2015/11/11/how-expose-kubernetes-services/ https://kubernetes-v1-4.github.io/docs/user-guide/kubectl/kubectl_expose/

Persistant volumes and directory sharing

https://www.stratoscale.com/blog/kubernetes/kubernetes-how-to-share-disk-storage-between-containers-in-a-pod/ http://suraj.pro/post/hostmount-minikube/ https://kubernetes.io/docs/concepts/storage/volumes/#local https://kubernetes.io/docs/tasks/configure-pod-container/configure-volume-storage/#configure-a-volume-for-a-pod https://kubernetes.io/docs/concepts/storage/persistent-volumes/ https://stackoverflow.com/questions/42456159/minikube-volumes?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

General information

https://linchpiner.github.io/k8s-multi-container-pods.html

GregSilverman commented 6 years ago

Create Kubernetes cluster on thalia0...3 using KUBEADM

GregSilverman commented 6 years ago

Get Argo workflow engine functioning.

GregSilverman commented 6 years ago

@rmcewan

DONE:

Made nlp-adapt-kube its own repo.
Cluster created on thalia.
k8s Dashboard working via ssh tunnel to thalia.
Implemented solution for hosting ALL docker images with strict licensing.

TODO:

Test new docker images in k8s that implement solutions for handling umls auth environment variables and configuration files (viz., ctake and clamp).
Test workflow on thalia.
Update wiki in repo

Once above is done, we can officially release this to the hounds. Should we wait to make the announcement, since we could technically release the minikube/single node version right now.

GregSilverman commented 6 years ago

superseded by https://github.com/nlpie/nlp-adapt-kube/issues/3

nlpie / nlp-adapt

Simple Parallelization Strategy #7