jupyterhub / kubespawner

Kubernetes spawner for JupyterHub
https://jupyterhub-kubespawner.readthedocs.io
BSD 3-Clause "New" or "Revised" License
536 stars 301 forks source link

Shouldn't KubeSpawner's cmd/args map to the K8s container's command/args? #493

Open consideRatio opened 3 years ago

consideRatio commented 3 years ago

I saw that c.KubeSpawner.default_url was reported to not have an impact and decided to investigate.

I end up concluding that configuring the command that the container starts with is very complicated. The Spawner, K8s, and the Dockerfile all have various names for similar things that all combine into a k8s containers args field where the command field apparently isn't configurable to my surprise.

Investigation conclusion

  1. I'm confused we opt to set args of the k8s container based on the traitlets cmd, args, notebook_dir, default_url instead of setting command to cmd, and setting args to default_url, notebook_dir and args.
  2. I think the logic to silently ignore the traitlets args, notebook_dir, default_url when cmd is unset is problematic and should lead to a warning at least.
behrica commented 3 years ago

I am trying to launch a rstudio image with this, and I am very much struggling that I can not "completely control" the command and args passed to starting the container

manics commented 3 years ago

Do you think we can resolve this issue before Z2JH 1.0.0 is released? Either by adding docs and warnings, or perhaps by adding a new entrypoint parameter, and making it clear that entrypoint and cmd have their Docker semantics.

Although this is a K8s project I think it's fine to use the Docker terminology, especially as it's also the Open Container Initiative terminology. What's more important is that we make it clear and use consistent names throughout.

consideRatio commented 3 years ago

I agree that we must be consistent, and in this case we must be consistent against the Spawner base class that is k8s unaware, so then adding entrypoint would make sense alongside cmd.

We would then document that cmd and entrypoint refers to the Dockerfile names, and that we will set the k8s Pod's container's args and command section when we set cmd and entrypoint in KubeSpawner.

To conclude: :+1: for adding entrypoint as a KubeSpawner traitlet to configure the containers command.

mapsacosta commented 3 years ago

@consideRatio is there going to be a separate fix for c.KubeSpawner.default_url?

manics commented 3 years ago

@mapsacosta if you set cmd the other args such as default_url should be passed. If it's not working can you show us your full configuration?

consideRatio commented 3 years ago

Update

In the initial post I wrote investigative conclusions, and I want to give an update about those two points.

  1. I'm confused we opt to set args of the k8s container based on the traitlets cmd, args, notebook_dir, default_url instead of setting command to cmd, and setting args to default_url, notebook_dir and args.
  2. I think the logic to silently ignore the traitlets args, notebook_dir, default_url when cmd is unset is problematic and should lead to a warning at least.

About spawner.get_args()

Note that the args considered by get_args() are as defined in JupyterHub's spawner.py:

Configuration Resulting argument
spawner.ip --ip=...
spawner.port --port=...
spawner.default_url --SingleUserNotebookApp.default_url=...
spawner.notebook_dir --notebook-dir=...
spawner.debug --debug
spawner.disable_user_config --disable-user-config
spawner.args <user defined args>

Action points

Related

manics commented 3 years ago

A non-breaking way to achieve (2) could be to treat the empty string as a special value, so cmd = null preserves the existing behaviour, and cmd = "" means add all args?

On the other hand if we go for a breaking change I think either we should keep the ability to not pass additional arguments, or clearly document the command arguments interface that should be supported.

Originally jupyter-notebook was the only client, but now we have jupyter-lab, jupyter-server, and people have written their own alternatives such as https://discourse.jupyter.org/t/new-package-to-run-arbitrary-web-service-in-jupyterhub-jhsingle-native-proxy/3493 (which is also one of the motivations for https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/2138)

consideRatio commented 3 years ago

A non-breaking way to achieve (2) could be to treat the empty string as a special value, so cmd = null preserves the existing behaviour, and cmd = "" means add all args?

Hmmm... I think we should opt for a potentially breaking change to resolve the future needs properly.

  1. Ability to configure ENTRYPOINT and CMD very clearly without interference from get_args()
  2. Ability to opt-in/opt-out of augmenting get_args()

I realize now that we can't augment get_args() to the Dockerfile's defined CMD because it would simply be overridden entirely, and that is probably the logic for now always passing get_args().

So... setting spawner.cmd will override the Dockerfile's CMD, so get_args() won't augment with it and would often not make sense unless something like jupyterhub-singleuser was defined in the ENTRYPOINT.

Also... you may want to configure spawner.cmd without ending up having get_args() augmented.

I think what we must support is opt-in/out-out of get_args(), and this can be in three modes:

I wonder if perhaps the Spawner base class should add some configuration about this though rather than KubeSpawner, and then we make use and adjust around that? The current behavior in KubeSpawner is KubeSpawner specific, and I think the Spawner base class current behavior is to always append get_args() to cmd.

minrk commented 3 years ago

It is extremely confusing that Kubernetes renames Docker's Entrypoint and Command as Command and Args, respectively!

While Kubernetes' names more accurately describe what happens (a command is passed arguments), I think the docker names more accurately describe what they are for (an entrypoint prepares an environment in which to launch a command).

In that sense, cmd and args are used to build a single command list, which we pass to kubernetes container arguments, just like we would with docker run $image command ... and I think we should never interact with the entrypoint at all, other than requiring that it allow arbitrary commands.

This is part of why I think things like docker stacks should implement their environment customization in entrypoint instead of CMD. Overriding CMD shouldn't be hugely consequential, but it is for docker-stacks.

I think what we must support is opt-in/out-out of get_args(), and this can be in three modes:

  • always append if cmd is set (current KubeSpawner behavior)
  • always append to cmd independent if its set or not (Dockerfile CMD will never be considered)
  • never append to cmd independent if its set or not

I'd add that option 2 would also require restoring the default cmd = 'jupyterhub-singleuser' because it doesn't make sense to append args without starting with the command itself. This would also make KubeSpawner consistent with all the non-container-based Spawners.

I wonder if perhaps the Spawner base class should add some configuration about this though rather than KubeSpawner, and then we make use and adjust around that?

I think this is really a KubeSpawner-specific issue, and should probably be a kubespawner option. It is really only the "get the default command from the image with kubespawner" case that is affected, because kubespawner cannot actually retrieve this info from the image. If the command is specified, everything works.

The Spawner API is that cmd ultimately launches jupyterhub-singleuser, and args will be passed to it. It is part of the API that this command accepts the CLI args of jupyterhub-singleuser, even if it may be a wrapper or adapter. Launching a command other than that is not currently supported by JupyterHub. What makes KubeSpawner (and DockerSpawner) unique and require special handling is that they have another source for the default command: the CMD field of the image itself. Dockerspawner deals with this by inspecting the image to extract CMD before appending arguments, if Spawner.cmd is unset. I don't know how feasible that would be in KubeSpawner (we'd have to deal with all kinds of registry and pull secret stuff to get it, I think). So I'm not sure what we can do to support that here. Maybe this means using the image's CMD by default is not something that's feasible in KubeSpawner.

This is related to why I opened https://github.com/jupyterhub/jupyterhub/pull/3381 - it would be nice if we could stop specifying CLI args (by default) in Spawners, and do everything through the env, but it's trickier to remove than I realized. I would like to get to a point where JupyterHub internally does not turn any options into CLI args, other than explicit CLI args from user config, and only communicates options through the environment. Then this problem would go away, I think...

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/kubespawner-and-ldapauthentication-run-under-users-ldap-uid/8988/7

minrk commented 2 years ago

I think this issue is mostly resolved by the fact that jupyterhub 2.0 does not specify any CLI arguments, nor place any requirements on the CLI signature of the launch command, and never will again. That means that Spawner.args no longer needs to conform to jupyterhub-singleuser, it only needs to conform to whatever the user chooses for cmd. A key point is that there's no difference between specifying singleuser.cmd and singleuser.args vs just concatenating the two in singleuser.cmd. In fact, given that the only case that doesn't work is specifying singleuser.args when using the command from the image (i.e. singleuser.cmd unspecified), arguably with KubeSpawner one should always specify singleuser.cmd as the full list with any args they want to pass, and can safely ignore singleuser.args.

This simplifies the configuration explanation for z2jh in 2.0:

If you want to specify the command to launch, there are two possibilities:

  1. (default) specify nothing to use the image's default command, which is assumed to eventually launch jupyterhub-singleuser, or
  2. customize the launch command and any arguments, via Spawner.cmd, for example:
    c.Spawner.cmd = ["jupyterhub-singleuser", "--some-arg=5", "..."]

Customizing only the arguments without specifying the command to launch (i.e. setting Spawner.args without Spawner.cmd) is not supported.

consideRatio commented 2 years ago

Thank you for clarifying things @minrk! I'm not yet comfortable with the situation about cmd in Spawner, KubeSpawner, and z2jh and consider if we should make some change before z2jh 2.0.0 about that.

Renewed overview

Brainstorming

Related

minrk commented 2 years ago

Do we want KubeSpawner to override cmd?

The default is less important to me; what's important to me is that the "use the image's command" case is clear and simple. If we can accomplish that, however we accomplish that, I'm okay with switching back to jupyterhub-singleuser to match other Spawners.

Whatever we do with the default, I think it's probably still better for the z2jh approach to specify either the full singleuser.cmd, or whatever special value (None, empty list, '$imageCommand'), but not expose (via documentation / schema) Spawner.args due to the fact that it only works when singleuser.cmd is also specified.

Why do we set c.Spawner.cmd and c.Spawner.default_url instead of c.KubeSpawner.cmd and c.KubeSpawner.default_url in z2jh's jupyterhub_config.py?

It doesn't make a difference in our case, but it's typical to set config either on the class that defines the trait, or on the subclass you know you are using. It makes no difference when there is only one class you are configuring, but if another class came up that was a sibling of KubeSpawner (inherits from Spawner but not KubeSpawner), there is a distinction:

That's mostly hypothetical in the current situation where there's really only one class to configure (or possible custom subclasses of KubeSpaner), so if things are clearer to set them on KubeSpawner only, that's okay, too. I would typically apply config to the class that defines the trait, though.

Traitlets config loading looks through the class inheritance, so it'll find it on Spawner, KubeSpawner, or anything in between that has the trait (if there were something in between).

Is there an action point to take pre z2jh 2.0.0 release?

I guess we need to decide if we want to revert the "use image by default" change. If not, then document the two choices (singleuser.cmd or 'default').

If we go back to jupyterhub-singleuser, then we need to revert the change, and make sure to implement and document the 'use the image' option, since that not working is what started the whole process.