doc: Update README for Docker image usage

dominikburri commented 3 years ago

The following procedure should be added in the documention for running execution workflows:

Check if tool already available as a Docker container. Check e.g.
- https://hub.docker.com/
- https://biocontainers.pro/registry or just Google for {tool name} Docker or {tool name} Dockerfile.
If tool not available
- build Docker image from a Dockerfile. For how to to this, you could follow this. For an example Dockerfile see: docs/templates/nextflow_dsl1/Dockerfile or docs/templates/snakemake/workflow/envs/[METHOD].Dockerfile.
- and place on the APAeval dockerhub: https://hub.docker.com/u/apaeval
  
  Note: only the tool without pre- or post-processing steps should be included in order to make it more flexible.
If the pre- or post-processing steps require specific bioinformatics tools, please check BioContainers or create and publish an additional image.

Ultimately, the docker images should be specified in the execution workflows:

For nextflow, the individual containers can be specified in the processes: https://www.nextflow.io/docs/latest/docker.html#multiple-containers.
For Snakemake, the individual containers can be specified per rule: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#running-jobs-in-containers

Please correct anything that is misleading or wrong.

uniqueg commented 3 years ago

I think it's okay here (since we are not producing production workflows) to package scripts/tools required for pre- and post-processing into the same Docker image, i.e., create only a single Dockerfile per execution workflow. Of course, if there are already Docker images for pre- and/or post-processing available, or if one or a set of specific pre- and/or post-processing has to be used for multiple execution workflows, it might make more sense to split them up and reuse. In any case, I would leave this up to the people implementing the execution workflows, either put all in one image or split it up by tool/workflow step.

uniqueg commented 3 years ago

I would also add a recommended name for the Docker images. In case the Docker image only contains the tool to be benchmarked, I would recommend to name them apaeval/{tool_name}:{tool_version}, e.g., apaeval/my_tool:v1.0.0. In case the Docker image contains tools/scripts to run pre- and post-processing as well, I would recommend calling them apaeval/exwf_{tool_name}:{commit_hash}, where commit_hash is the short SHA of the Git commit in the APAeval repo that last modified the corresponding Dockerfile, e.g., 65132f2.

However, pushing Docker images is restricted to 3 people at a time, so I need to set/change permissions whenever someone needs to push. People should let Yuk Kei or me know when they want to push so that we can grant you the permissions. We can also push it for people if they have already tested that the Docker image builds locally from the latest version of the Dockerfile and they have committed that latest Dockerfile to a branch and pushed it to the remote repo (so that we can check out that branch, build and push it).

^^ Would be good to add comments on these two points to the instructions

iRNA-COSI / APAeval

doc: Update README for Docker image usage #158