inab / WfExS-backend

Workflow Execution Service Backend
Apache License 2.0
16 stars 6 forks source link

Can't execute workflows using podman #54

Open dcl10 opened 1 year ago

dcl10 commented 1 year ago

Description

Using stage I can stage a workflow with podman. However running the workflow with staged-workdir offline-exec I get the following error:

ERROR Workflow error:
Docker is not available for this tool, try --no-container to disable Docker, or install a user space Docker replacement like uDocker with --user-space-docker-cmd.: Docker image hutchstack/rquest-omop-worker:next not found

Fiddling with the code on a fork, I found adding --no-container or --user-space-docker-cmd isn't compatible with --podman.

In cwl_engine.py I found that commenting out the --disable-pull line seemed to fix the problem and the workflow runs as expected. However, I guess the --disable-pull is there for a good reason. Could something be preventing WfExS from looking where the podman image is saved for the staged image?

jmfernandez commented 1 year ago

Hi @dcl10 . With the information you have provided, I have been digging in WfExS, and then in cwltool. I found a couple of issues related to cwltool support of newer versions of podman (see https://github.com/common-workflow-language/cwltool/issues/1884 and https://github.com/common-workflow-language/cwltool/issues/1883).

BTW, which version of podman are you using?

dcl10 commented 1 year ago

Hi @jmfernandez, sorry for the late reply. We have used podman 3.x which is installed by default with apt install on ubuntu. We've also tried 4.x which has a slightly more complicated install that I can't remember. Either way, same result.

jmfernandez commented 11 months ago

Hi again, past weeks I created a couple of pull requests to cwltool in order to fix their issues with podman, and both of them were accepted. Meanwhile cwltool release containing the fixes happens, latest commits on WfExS side are now installing a development version of cwltool when the workflow is instantiated.

Also, I have pushed changes to WfExS code related to podman containers management, so now a podman registry is located on each working directory, as well as compressed container images to restore it. Previous implementations used a shared podman registry located in the shared WfExS caching directory, which is a problem in case the cache is cleared or some file is tainted.

But the key part is that compressed container images in the working directory, along their metadata, are ruling the contents of the working directory podman registry. In case the working directory is transferred, due the way podman works, most of the files and directories of the unpacked images in the podman registry cannot be copied, due they are using other uids/gids. So, when a workflow is being run what it is now checked is the integrity of the working directory podman registry, in order to re-populate it using the compressed container images.

As a side note, I have also discovered that in Ubuntu 22.04, the installation of podman requires some tweaks, due "interferences" with systemd-homed (see https://wiki.archlinux.org/title/Podman#Set_subuid_and_subgid and https://github.com/systemd/systemd/issues/21952 )