vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
274 stars 45 forks source link

Container interface improvements #1021

Closed gaow closed 6 years ago

gaow commented 6 years ago

Separated from an earlier offline discussion: I think perhaps

[global]
parameter: container_type = "docker"

[default]
bash: container = f'{container_type}:gaow/debian-ngs'

is more general than the current interface. Reasons:

  1. If we later support singularity we need a unified interface
  2. With this, we can have container_type = "local" or something like it to disable the use of containers easily for a given run.
BoPeng commented 6 years ago

Good suggestion, we have docker_image because this is the only container we support right now and have not really thought of how to support other types of containers. A generally interface like this makes sense although I am not sure if we should make the change right now for the JupyterCon presentation.

gaow commented 6 years ago

A generally interface like this makes sense although I am not sure if we should make the change right now for the JupyterCon presentation.

I think when you introduce docker_image people might start questioning its limitations. It makes sense to me to make a change before introducing the feature publicly, but I guess the biggest limitation is lack of time to change the code, documentation and make sure it is backwards compatible at least for the examples we submitted to zenodo ..?

BoPeng commented 6 years ago

The code change should be simple if we do a simple translation internally and allows both options.

gaow commented 6 years ago

Good idea -- we should make a change now to allow for the proposed interface, and use it in the slides. But we can change the documentation / more careful implementation / refined interface later when we bring in singularity. I might have a need for it due to constraints on UChicago cluster. But not in the next 2 months.

gaow commented 6 years ago

Singularity image specification is a bit different from docker:

https://singularity-hub.org/collections/1021

to use one of them:

shub://anushree85/singularity_imgs:wilson_caffe_0.15_dann_cuda7.5_cudnn5

it is still account/image:tag format at the face value.

gaow commented 6 years ago

And there are also dockers provided from other registries, if we consider unified interface:

https://cloud.google.com/container-registry/docs/pushing-and-pulling

sorry things seem to get complicated even for simple changes.

BoPeng commented 6 years ago

My question is if there is an established convention for container specification, used in AWS or other multi-container environment so that we do not have to invent something.

gaow commented 6 years ago

Here are some conventions from nextflow doc:

https://www.nextflow.io/docs/latest/singularity.html#singularity-docker-hub

Here is one example now it is used in nextflow:

https://github.com/NBISweden/K9-WGS-Pipeline/tree/master/conf

BoPeng commented 6 years ago

We need a single naming scheme for thing like:

docker:fedora/httpd:version1.0
singularity:shub://vsoch/hello-world

and you can see having two : and // does not look right. I mean, singularity:shub looks like the scheme of the URL.

So I would rather have something either a bit more rigorous, such as

docker::fedora/httpd:version1.0
singularity::shub://vsoch/hello-world

or use some convention such as

shub://...

for singularity.

There is also the problem with option docker_file. We currently just load the file without knowing the tag, and the usage is usually

action: docker_file='filename', docker_image='tag_loaded_from_image'
...

and I am not sure if we can handle docker_file also in this schema, something like

container='docker::tag_loaded?docker_file='

I mean, we can do all the crazy things but it is not easy to come up with some universally acceptable name convention.

BoPeng commented 6 years ago

Note that singularity uses command

sudo singularity --verbose import tensorflow.img docker://tensorflow/tensorflow:latest

to specify docker image tensorflow/tensorflow:latest, so we can do something like

tensorflow/tensorflow:latest

is the default docker image, and is a shortcut for

docker://tensorflow/tensorflow:latest

Then we can say

shub://tensorflow/tensorflow:latest

is for singularity.

gaow commented 6 years ago

Yes indeed, we can leverage the fact that sigularity runs docker images, thus making it conceptually simpler. But there might be issues down the line for more other types of containers (we hope not!). So keeping an interface as simple as possible (not too crazy) should be the way to go.

and I am not sure if we can handle docker_file

I think there can certainly be implicity convension that when docker_file presents we will assume the container is docker.

BoPeng commented 6 years ago

Done, the naming convention is now

tag
docker://tag
local://tag

with room for new schemes later.

Document updated: https://vatlab.github.io/sos-docs/doc/tutorials/SoS_Docker_Guide.html https://vatlab.github.io/sos-docs/doc/documentation/Targets_and_Actions.html#container

BoPeng commented 6 years ago

Note that option docker_file is marked deprecated but remains. This option loads the image with unknown tag and I think it better to be an action, but on the other hand

sh:
   docker load filename

is enough.