Open antoinetran opened 2 days ago
The InterLink Slurm Plugin requires /bin/sh
, I guess because of SingularityPrefix
which is a feature documented in InterLink Slurm, which allows to inject a command before the main command. However this prefix is not implemented, but just documented.
I propose to remove this /bin/sh
, and the SingularityPrefix
, which is not used anyway today, so that singularity calls are done directly with the main command.
Bin/sh is used to build the command executed by Singularity (that does not know the entrypoints of the containers).
What is your proposal then exactly?
I will illustrate my proposition with examples using a docker image that does not contains /bin/sh
, but contains one entrypoint but no command:
podman history quay.io/argoproj/argoexec:v3.5.4
ID CREATED CREATED BY SIZE COMMENT
335e6412fad1 9 months ago ENTRYPOINT ["argoexec"] 0 B buildkit.dockerfile.v0
It is a bit simple and complex. Docker image can contain entrypoint and command, only entrypoint, only command, or nothing. Kubernetes yaml can be described to run a container, and override either entrypoint (with "command" field, which is a bit confusing), command (with "args" field), both, nothing.
Yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: helloworld
spec:
template: # create pods using pod definition in this template
spec:
containers:
- name: helloworld
image: quay.io/argoproj/argoexec:v3.5.4
#command: [ "sleep", "10" ]
podman run --rm quay.io/argoproj/argoexec:v3.5.4
argoexec is the executor sidecar to workflow containers
OCI Container are converted to Singularity SIF with entrypoint/command translated to runscript field:
https://docs.sylabs.io/guides/3.6/user-guide/definition_files.html#runscript
Then singularity run
will execute the runscript if present. So to reproduce the same output as podman beyond:
=> this is what InterLink Slurm should run:
singularity run docker://quay.io/argoproj/argoexec:v3.5.4
INFO: Using cached SIF image
argoexec is the executor sidecar to workflow containers
Yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: helloworld
spec:
template: # create pods using pod definition in this template
spec:
containers:
- name: helloworld
image: quay.io/argoproj/argoexec:v3.5.4
command: [ "/etc/passwd"]
=> InterLink Slurm should run:
singularity exec docker://quay.io/argoproj/argoexec:v3.5.4 /etc/passwd
INFO: Using cached SIF image
FATAL: permission denied
I run a /etc/passwd
, which is not an executable, on purpose, to show that we can run something which is not /bin/sh
, nor the default command. The "permission denied" is the expected stderr here.
Yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: helloworld
spec:
template: # create pods using pod definition in this template
spec:
containers:
- name: helloworld
image: quay.io/argoproj/argoexec:v3.5.4
#command: [ "argoexec"]
args: [ "--log-format", "json", "version" ]
=> InterLink Slurm should run:
singularity run docker://quay.io/argoproj/argoexec:v3.5.4 argoexec --log-format json version
INFO: Using cached SIF image
argoexec: v3.5.4
Yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: helloworld
spec:
template: # create pods using pod definition in this template
spec:
containers:
- name: helloworld
image: quay.io/argoproj/argoexec:v3.5.4
command: [ "argoexec"]
args: [ "--log-format", "json", "version" ]
=> InterLink Slurm should run:
singularity exec docker://quay.io/argoproj/argoexec:v3.5.4 argoexec --log-format json version
INFO: Using cached SIF image
argoexec: v3.5.4
Yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: helloworld
spec:
template: # create pods using pod definition in this template
spec:
restartPolicy: Never
containers:
- name: helloworld
image: alpine
command:
- /bin/sh
# This "-x" quote is interpreted by yaml, not by bash
- "-x"
- "-c"
args:
- |
echo "multiline \"script line1"
echo multiline script line2
If we run this in Kubernetes, we have
multiline "script line1
multiline script line2
+ echo 'multiline "script line1'
+ echo multiline script line2
=> InterLink Slurm should run:
singularity exec docker://alpine /bin/sh -x -c "echo \"multiline \\\"script line1\";echo multiline script line2"
INFO: Using cached SIF image
+ echo 'multiline "script line1'
multiline "script line1
+ echo multiline script line2
multiline script line2
I also tried with https://hub.docker.com/r/komljen/ssg image, which contain only command, but no entrypoint. If we reproduce case 3 "case only command is specified in Kubernetes yaml", singularity run
can handle the fact that no entrypoint is defined, but only command, well:
singularity run docker://docker.io/komljen/ssg sh
INFO: Using cached SIF image
Apptainer>
Let me see how to implement all cases, maybe I will do a simple implementation first ^^
My implementation suggestion:
if kubernetes yaml container specifies a command field (equivalent to entrypoint in docker), then we run singularity exec
and append any args if exist.
in any other cases, we run singularity run
, and append any args if exist.
Note for robustness: any command/args strings must be shell escaped, so I will use https://pkg.go.dev/github.com/alessio/shellescape#section-readme GO package, which I already used in interlink, so that any quote or space are interpreted correctly by singularity shell.
Alright, now it's clear to me, sorry I missed the point before.
That looks great, I'll rename this issue as ENTRYPOINT support
@antoinetran let me know if you need anything from me, or any help.
I'm opening an issue also in the tests to insert an entrypoint only test.
Ok I tested with an Argo step, that runs 3 containers: init wait and main. I can see the change in job.sh
: now it is
#!/bin/bash
#SBATCH --job-name=7cb1283d-85d2-4e49-bd7a-00afe3085967
#SBATCH --output=/home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/job.out
#SBATCH --mem=512
#SBATCH --cpus-per-task=3
singularity exec --containall --nv --bind /home/username/.interlink/workflows_workspace/appname-lsdsharedfs-template-dx5fk:/work --env-file /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/envfile.properties --bind /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/emptyDirs/var-run-argo:/var/run/argo:rw docker://quay.io/argoproj/argoexec:v3.5.4 argoexec init --loglevel info --log-format text &> /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/init.out; echo $? > /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/init.status
singularity exec --containall --nv --bind /home/username/.interlink/workflows_workspace/appname-lsdsharedfs-template-dx5fk:/work --env-file /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/envfile.properties --bind /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/emptyDirs/var-run-argo:/var/run/argo:rw --bind /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/emptyDirs/tmp-dir-argo:/tmp:rw docker://quay.io/argoproj/argoexec:v3.5.4 argoexec wait --loglevel info --log-format text &> /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/wait.out; echo $? > /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/wait.status; sleep 30 &
singularity exec --containall --nv --bind /home/username/.interlink/workflows_workspace/appname-lsdsharedfs-template-dx5fk:/work --env-file /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/envfile.properties --bind /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/emptyDirs/var-run-argo:/var/run/argo:rw docker://[imageXXXXXXXXXXXXXX] /var/run/argo/argoexec emissary --loglevel info --log-format text -- lgfk-cli --workdir=/work --pipeline-run-id=appname-lsdsharedfs-template-dx5fk --servers=nats://my-nats.nats:4222 start-block --input-data=stub1_output_preprocessed_signal.h5 --input-group-configuration-path=/work/stub1_group_configuration.json --block-label=160 --group-label=stub1 --n-steps=1000 --current-iteration=0 --output-checkpoint=/work/stub1_160_mcmc.dump &> /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/main.out; echo $? > /home/username/.interlink/argo-workflows-7cb1283d-85d2-4e49-bd7a-00afe3085967/main.status; sleep 30 &
The hard-coded /bin/sh
is now removed. The containers now runs better, but I can see a regression, I think the envfile.properties
from issue #27 are making collision. Fixing it now...
Fixed, there is now one envfile_[containername].properties
per container.
Short Description of the issue
When running a container image without /bin/sh inside, then InterLink Slurm Plugin runs the container and fails with
Environment
Steps to reproduce
Run an Argo helloworld step.
Logs, stacktrace, or other symptoms
job.sh
Summary of proposed changes