Open marcodelapierre opened 2 years ago
In the Docker lua/tcl module templates (and I replicated in the wrapper templates, too), the alias command is broken in two chunks, "entrypoint" and "args". The former is passed to Docker/Podman through "--entrypoint", whereas the latter goes as the final argument for the Docker command. Can you remind me what is the rationale for this, i.e. why this separation is needed?
Yes! So the first reason is primarily that I'm used to calling "the main executable we hit" the entrypoint, and everything else is args, so when writing the original templates I called them as such to better map them to the familiar docker/singularity commands that I knew. Functionality wise, the entrypoint is more important for docker/podman than singularity, because with singularity you can just use exec and dump the entire set without much thought. I'll follow up on this in answering your next questions.
With Docker, I used to take an alternate approach, where I was passing everything as argument for the Docker command, without altering the entrypoint. Do you know of scenarios where this approach would break?
This works as long as the container entrypoint is bash or some shell derivative. Most containers are pretty good these days about custom logic - e.g., "if the user doesn't provide any args to python, give them a shell, otherwise use /bin/bash." As an example with python (and note this didn't always work like this!):
# custom stuff - hand to bash!
$ docker run -it python echo "hello"
hello
# nothing - start python
$ docker run -it python
Python 3.10.4 (main, Mar 24 2022, 23:04:21) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
But this isn't the case for a lot of containers. Here is a very badly designed container by someone named vanessa on Docker Hub:
$ docker run -it vanessa/salad echo "hello"
No help topic for 'echo'
And oops, it's because the entrypoint is an executable, "salad" that prints salad and fork puns.
$ docker run -it vanessa/salad
I can't help with yo' Momma, I'm not that kind of fork.
⎯⎯∈
So taking a command like "echo hello" and splitting it into entrypoint and "the rest" is a more conservative approach that is more likely to work with more containers.
$ docker run -it --entrypoint echo vanessa/salad hello
hello
I have recently identified cases where, on the other hand, the current SHPC approach of breaking in two would break. It's a case where the entrypoint is very specific, in particular something like bash -l -c, to make sure Docker sources the /etc/profile at container execution. In this scenario, the altering of entrypoint by SHPC would break the expected behaviour of the container.
Hmm, so this must be an entrypoint defined by a container not edited by shpc? So to separate two things, "entrypoint" in shpc land means that we've taken a container.yaml and parsed an alias with or without arguments into parts, and then we provide the parts to the container. This is different than definition of an ENTRYPOINT in a docker or podman container, which I agree might have some kind of bug that shpc cannot anticipate, e.g., "it ran bash but didn't source this file." For this bug we'd want people to use the original entrypoint command, meaning the <command>-run
with custom arguments as that would keep the original container entrypoint. The <command>-shell
might still be buggy since we specify the entrypoint.
Based on point number 2., I would argue that passing the entire alias as Docker command argument, without changing the entrypoint, would result in a more robust behaviour. It is also consistent with what is done with Singularity, where the whole alias is always passed as a single argument.
I disagree that this is a more robust behavior because of the reasons I outlined above - many entrypoints are not bash. However I do think it is more than a small edge case, but it's going to vary based on container. My suggestion for the fix here would be to have a container.yaml boolean to specify to use the original entrypoint for the commands that require it, or let shpc decide the shell. We would probably want it to be easy to over-ride by either the install admin and/or the user.
Thanks for your patience in explaining in so much detail to me!
So, probably the key take away for me is that entrypoint
in SHPC !=
ENTRYPOINT
in Docker !
Before adding a new functionality in SHPC, I am wondering whether I can code my entrypoint + cmd combo right in the alias in a container.yaml
. I will give it a go in the next few days, but just as a spoiler for you, this is what it looks like
ENTRYPOINT [ "/bin/bash", "-l", "-c", "$*", "--" ]
CMD [ "/bin/bash" ]
I love it because it allows to source stuff at startup, such as in /etc/profile
, and at the same time it gives the flexibility of picking any command/executable to run.
So, I will try and see if an alias like the following will work:
<alias>: /bin/bash -l -c '$*' -- <path/to/executable>
Will keep you posted.
Your idea of a boolean functionality is great, I am just wondering how often this need arises, and I would not want to add "too many" functionalities and make it SHPC too complex.
Oh, and for the context of why I practically need the ENTRYPOINT above, it is for an OpenFoam container (a software for Computational Fluid Dynamics). OpenFoam ships with a sourceable bashrc
, which sets tons of env variables and does other shenanigans, which are a nightmare to code in the Dockerfile itself. So, the easiest is just to ensure that bashrc
is sourced when the container starts.
I am not sure how frequent this situation happens in the wider space of containers for scientific computing.
Hi @vsoch ,
I might have touched on this already, but can't find the references, so I am asking again.
In the Docker lua/tcl module templates (and I replicated in the wrapper templates, too), the alias command is broken in two chunks, "entrypoint" and "args". The former is passed to Docker/Podman through "--entrypoint", whereas the latter goes as the final argument for the Docker command. Can you remind me what is the rationale for this, i.e. why this separation is needed?
A couple of points for context:
bash -l -c
, to make sure Docker sources the /etc/profile at container execution. In this scenario, the altering of entrypoint by SHPC would break the expected behaviour of the container.Based on point number 2., I would argue that passing the entire alias as Docker command argument, without changing the entrypoint, would result in a more robust behaviour. It is also consistent with what is done with Singularity, where the whole alias is always passed as a single argument.
This being said, I suspect you had a good reason for your implementation.