Create a Tutum Stackfile that will enable multiple runs of Sibernetic

slarson commented 8 years ago

Output needs to be mapped to a central container that can store all the output.

First MVP of this is one container running Sibernetic linked to one container that receives the output.

tnaoi commented 8 years ago

Trying to figure out a more elegant way to run a data-only container (using docker-compose and yaml files), but a quick and dirty way to achieve the MVP is:

1) Run sibernetic in a docker container following these directions: https://gist.github.com/jamiemori/4d437587f82acb50a459

_NOTES_

For step 2, use this bash script instead:

#!bin/bash      

docker run --privileged -ti \                                                                                                                                    
    --name sibernetic \                                                                                                                                          
    -v /data \                                                                                                                                                   
    -v /tmp/.X11-unix:/tmp/.X11-unix:rw \                                                                                                                        
    --device /dev/nvidia0:/dev/nvidia0 \                                                                                                                         
    --device /dev/nvidiactl:/dev/nvidiactl \                                                                                                                     
    -e DISPLAY=unix$DISPLAY \                                                                                                                                    
    -e LIBRARY_PATH=/opt/AMDAPPSDK-3.0/lib/x86_64 \                                                                                                              
    -e LD_LIBRARY_PATH=/opt/AMDAPPSDK-3.0/lib/x86_64 \                                                                                                           
    -e OPENCL_VENDOR_PATH=/etc/OpenCL/vendors \                                                                                                                  
    openworm/openworm_test /bin/bash

This will mount the volume /data in a host file in /var/lib/docker/volumes/...

Also for step 7 run:

./Release/Sibernetic -f worm -l_to

The -l_to command will save the simulation into /sibernetic/buffers. You can also git pull the latest version of sibernetic, which supports the lpath flag and specify /data as the output directory for the simulation (haven't tested this).

2) Exit the container. There are several options for exiting the container such as using ctrl P + ctrl Q, or run another terminal window using something like tmux. Then in the host shell run:

docker run -ti --volumes-from sibernetic --name data openworm/openworm_test

Now containers sibernetic and data will share the /data directory. If you saved the output of the simulation in /buffers, you have to move the simulation files into /data yourself. If you used lpath, the files should already be there.

3) To remove the containers just use:

docker rm -v <container name>

The -v flag prevents dangling container volumes.

slarson commented 8 years ago

@jamiemori Hugely helpful!

slarson commented 8 years ago

@jamiemori I've made some commentary over on gitter chat: https://gitter.im/openworm/sibernetic?at=568b412217eed9cd1bbbf75c

slarson commented 8 years ago

OK, with my latest startup script (https://github.com/openworm/sibernetic/commit/966664f97f1e0aa3205456c591335e8b79c20ae1) I think we can potentially reduce the run command to:

docker run -ti \                                                                                                                                    
    --name sibernetic \                                                                                                                                          
    -v /data \  
    --entrypoint /sibernetic/bin/entrypoint.sh                                                                                                                                                                                                                                                                                                                                                                                         
    openworm/openworm_test

The environment variables being set before are now set within the entrypoint script.

This would run in a headless configuration, so I'm leaving out all the X stuff. And hopefully it could run in CPU mode only, so it leaves out all the nvidia device stuff. But that is an open question.

Right now the latest version isn't building (#92) but when that is addressed we can test out this entrypoint script.

slarson commented 8 years ago

Entry point script has just been updated -- need to test it out up on the Tutum environment. This also requires implementing the data linkage container so that we can read out the data.

slarson commented 8 years ago

First draft of such a stack file:

data:
 image: 'openworm/openworm_test:latest'
 command: '/bin/sh -c "while true; do echo hello world; sleep 1; done"'
 tags:
   - big
 volumes_from:
   - sibernetic-1
   - sibernetic-2
sibernetic-1:
 image: 'openworm/openworm_test:latest'
 entrypoint: /entrypoint.sh
 environment:
   - BRANCH_SIBERNETIC=development
   - CONF=worm_deep_water
   - OCL_FILE=sphFluid_crawling.cl
   - REBUILD_SIBERNETIC=true
   - SIBERNETIC_NAME=sibernetic-1
   - TIMELIMIT=2
 tags:
   - big
 volumes:
   - /data-sibernetic-1
sibernetic-2:
 image: 'openworm/openworm_test:latest'
 entrypoint: /entrypoint.sh
 environment:
   - BRANCH_SIBERNETIC=development
   - CONF=worm_deep_water
   - REBUILD_SIBERNETIC=true
   - SIBERNETIC_NAME=sibernetic-2
   - TIMELIMIT=2
 tags:
   - big
 volumes:
   - /data-sibernetic-2

Next step -- test, log in to the data service, see if data volume is actually mapped and if results are being stored there.

slarson commented 8 years ago

OK, after several modifications to the stack file and the entrypoint script, it looks like we got the mapping to the data node to work!

Next steps can be to set up the data node not to run with "hello world" but just to do a zip up and dropbox upload of the output of the other nodes. This means the data node doesn't really need to be running all the time, just at the end to sweep up the data from the other nodes.

We probably need to investigate a way to indicate when the simulation actually "ends", so we know when to do that.

tnaoi commented 8 years ago

This is great! Was wondering what the progress was on this project. Is there anything I could help with? Cheers from New York

slarson commented 8 years ago

@jamiemori yes! Actually we could use your help in a different part of the project but still working with the docker image. Check out this chat on gitter for that. Cheers to you!

slarson commented 8 years ago

OK we've gotten this working (even though Tutum has now switched to Docker Cloud). The most recent version of the stackfile has been updated above.

openworm / sibernetic

Create a Tutum Stackfile that will enable multiple runs of Sibernetic #90