Closed rmcewan closed 6 years ago
This is dependent on the outcome of #4 ...
@rmcewan, this project looks like it will meet requirements as per our conversation about parallelization strategies: https://github.com/docker/swarm also see https://docs.docker.com/engine/swarm/swarm-tutorial/
@GregSilverman Looks promising - we should meet to flesh out compatibility between direction of this effort and existing/new NLP-PIER architecture extensions.
@rmcewan: first step in swarm is to look into job control flow. In particular, need to control when containers are created as dependencies on when a particular job is complete; e.g., when annotator engines are done processing notes, then ES container creation and indexing can begin, etc.
@rmcewan: We may need to use Kubernetes for clustering, since it, unlike Docker Swarm has a well-maintained job control module. See https://github.com/argoproj/argo for details. The alternative would be to use this for Swarm: http://dray.it or Jenkins to manage the workflow. http://dray.it does not seem to be well maintained, and Jenkins is not an optimal solution for this. However, Kuberentes can utilize all the work I've done in Docker, through use of this tool: https://github.com/kubernetes/kompose... along similar lines, it seems that Docker may soon become Kubernetes, especially since the bleeding edge OS X version now supports it: https://docs.docker.com/docker-for-mac/kubernetes/
Please take a look at these resources. I will do more exploration with this and may actually try to create a simple workflow between an annotator engine and elasticSearch (that is, once the output has been generated from the annotator engine then I will kick off the elastic container process).
@rmcewan: After an initial struggle, I got a local minikube cluster successfully working using ElasticSearch and NLP-TAB. I will work next on adding Amicus and BioMedICUS as nodes to the cluster. Once this is done, I will add in container job scheduling per the https://github.com/argoproj/argo project.
In summary: Kubernetes is the clustering solution we will deploy with our Docker images.
Brain dump of setting up a minikube local cluster config for sharing local host directory:
This assumes that Docker images are either already available locally, or through the Docker.io registry (NB: to connect to docker.io, set environment variable, export DOCKER_CONFIG=~/.docker/
; then there is a workaround to get functioning with Mac OS: modify config.json
with credentials, copy to ~/.docker/config.json
, and then from CLI, do a docker login
; you can then do a kompose up --build local
to build image and push to registry, or else you can build locally using docker build -t image_name .
followed by a docker push image_name
- if you want to push to the registry, otherwise, you can just use kompose up --build none
to grab the image that you just built using the docker
command)
Issue following commands: (we do not suggest using the kompose
tool; the above is just a method to create Docker images and for illustrative purposes)
docker build -t image_name .
minikube start
minikube mount $HOME/development/test_data:/data
In new terminal:
kubectl create -f test.yml
kubectl expose deployment es --type=NodePort
kubectl describe service es
minikube service es
minikube dashboard
minikube ssh
kubectl exec -it es-75457dd555-nc5vk -- /bin/bash
Sources:
https://kubernetes.io/docs/tools/kompose/user-guide/ https://medium.com/@claudiopro/getting-started-with-kubernetes-via-minikube-ada8c7a29620 https://kubernetes.io/docs/tutorials/stateless-application/hello-minikube/
https://stackoverflow.com/questions/45523220/minikube-service-servicename-url-return-nothing https://kubernetes.io/docs/tasks/access-application-cluster/connecting-frontend-backend/#creating-the-backend-using-a-deployment https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ https://www.stratoscale.com/blog/kubernetes/kubernetes-exposing-pods-service/ http://rafabene.com/2015/11/11/how-expose-kubernetes-services/ https://kubernetes-v1-4.github.io/docs/user-guide/kubectl/kubectl_expose/
https://www.stratoscale.com/blog/kubernetes/kubernetes-how-to-share-disk-storage-between-containers-in-a-pod/ http://suraj.pro/post/hostmount-minikube/ https://kubernetes.io/docs/concepts/storage/volumes/#local https://kubernetes.io/docs/tasks/configure-pod-container/configure-volume-storage/#configure-a-volume-for-a-pod https://kubernetes.io/docs/concepts/storage/persistent-volumes/ https://stackoverflow.com/questions/42456159/minikube-volumes?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
Create Kubernetes cluster on thalia0...3 using KUBEADM
Get Argo workflow engine functioning.
@rmcewan
DONE:
TODO:
ctake
and clamp
).Once above is done, we can officially release this to the hounds. Should we wait to make the announcement, since we could technically release the minikube
/single node version right now.
superseded by https://github.com/nlpie/nlp-adapt-kube/issues/3
Look into scale out approaches to increase throughput. Options include: multiple machines; UIMA-AS (prob not, b/c MetaMap and CLAMP) are too opaque; batching of documents and allocating to multiple machines