EC4Docker is a simple Elastic Cluster whose nodes are contaniers. There exists a front-end that can be accessed by ssh, and the internal working nodes are powered on or off according to the needs (if the nodes are not used for a while, they are powered off, and they are powered on if they are needed).
Features of the cluster:
EC4Docker may seem a bit useless because it is currently deployed on a single cluster, but consider its integration with Docker Swarm and you'll have an Elastic Cluster that is deployed over a multi-node infrastructure.
In first place, you need to chose the cluster manager middleware. Torque and SLURM are currently available, but you can create your own Dockerimage files according to your specific middleware.
Once selected, you need to build the build the front-end and working node base images by issuing the following commands:
docker build -f frontend/Dockerfile.clues -t ec4docker:frontend ./frontend/
docker build -f wn/Dockerfile -t ec4docker:wn wn/
Then you need to create the images that correspond to the middleware:
For the case of Torque, you can use the following commands:
docker build -f frontend/Dockerfile.torque -t ec4dtorque:frontend ./frontend/
docker build -f wn/Dockerfile.torque -t ec4dtorque:wn wn/
For the case of SLURM, you can use the following commands:
docker build -f frontend/Dockerfile.slurm -t ec4dslurm:frontend ./frontend/
docker build -f wn/Dockerfile.slurm -t ec4dslurm:wn wn/
The images will be built and registered in your local registry.
Alternatively you can build the non-elastic version: by not installing CLUES in the frontend. In order to make it, you can create the base images issuing the following commands:
docker build -f frontend/Dockerfile.static -t ec4docker:frontend ./frontend/
docker build -f wn/Dockerfile -t ec4docker:wn wn/
In this case you need to power the nodes on or of by hand (using the provided scripts in folder /opt/ec4docker).
NOTE: you are advised to modify the Dockerfile files in order to include your libraries, applications, etc. to customize your cluster. Another option is to build the provided Dockerfiles and create your owns that start from the created one (you can check the FROM clause in the Dockerfile file).
You should create a config file (ec4docker.config) to set the name of your cluster (this name will be set for the front-end node in docker), the base name for the working nodes (they should be named as _basename_1, _basename_2, etc.) and the max amount of computing nodes. You must also set the names of the docker images according to the previous step.
Two examples are provided:
The file ec4docker-torque.config for the case of Torque:
EC4DOCK_SERVERNAME=ec4docker
EC4DOCK_MAXNODES=4
EC4DOCK_FRONTEND_IMAGENAME=ec4dtorque:frontend
EC4DOCK_WN_IMAGENAME=ec4dtorque:wn
EC4DOCK_NODEBASENAME=ec4dockernode
And the file ec4docker-slurm.config for the case of Torque:
EC4DOCK_SERVERNAME=ec4docker
EC4DOCK_MAXNODES=4
EC4DOCK_FRONTEND_IMAGENAME=ec4dslurm:frontend
EC4DOCK_WN_IMAGENAME=ec4dslurm:wn
EC4DOCK_NODEBASENAME=ec4dockernode
NOTE: In this file the cluster will be named ec4docker and the maximum number of working nodes is set to 4. You are advised to change the name of your frontend and the amount of working nodes that will be available.
You can use the script setup-cluster to create the front-end of the cluster, from the corresponding docker image. If the cluster already exists, this script will ask you to kill it.
IMPORTANT: In order to be able to use the NFS shared filesystem, you MUST enable nfsd module in the kernel of the docker servers that hosts the containers.
$ modprobe nfsd
In order to create your cluster, defined in ec4docker-torque.config file, you can issue the following command:
$ ./ec4docker -ct -f ec4docker-torque.config
NOTE: The settings of the clusters are those that are set in file ec4docker-torque.config file. Take note of those settings because you will need them in order to access the cluster. In special, the name of the cluster which is in _EC4DOCKSERVERNAME.
WARNING: The cluster is created on a Docker aside Docker approach. That means that the front-end will issue docker calls to create and to destroy the docker containers that will serve as working nodes from the cluster. But these docker containers will be created in the docker host that started the front-end. In order to use this approach, the docker communication socket and the docker binary from the host are shared with the container.
Once the front-end has been created you can enter into the front-end container and su as the ubuntu user (which is the only user created in the cluster). An example of the command like is provided next (the name of the container depends on your configuration; i.e. the ec4docker.config file):
$ docker exec -it ec4docker /bin/bash
root@ec4docker:/$ su - ubuntu
Altenatively you can ssh the front-end. The SSH is exposed in the creation of the frontend, so you can guess the port where the front-end will listen by using the docker port command:
$ docker port ec4docker
22/tcp -> 0.0.0.0:32770
In this example, you can ssh to ubuntu@localhost at port 32770 with a command like the next one (the default password is "ubuntu", and it is set in the Dockerfile):
$ ssh -p 32770 ubuntu@localhost
Now you can issue commands to the queue, and CLUES will intercept the call and will power on some working nodes in the cluster.
An example is the next:
$ echo "hostname && sleep 10" | qsub
1.ec4docker
$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1.ec4docker STDIN ubuntu 0 R batch
$ ls -l
total 4
-rw------- 1 ubuntu ubuntu 0 Feb 12 11:15 STDIN.e1
-rw------- 1 ubuntu ubuntu 13 Feb 12 11:15 STDIN.o1
NOTE: For the non-elasic version, you can power on some nodes from inside the front-end, by hand by issuing commands like the next:
$ /opt/ec4docker/poweron ec4dockernode1
$ /opt/ec4docker/poweron ec4dockernode2
If any of the docker containers fail (for any reason), please check the output of the command docker logs <container>
.
Some common issues are: