jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes

https://zero-to-jupyterhub.readthedocs.io

Other

1.51k stars 789 forks source link

Discussing use of a VM #1432

Closed consideRatio closed 3 years ago

consideRatio commented 4 years ago

UPDATE: Resolution

The contributing docs work without a dedicated VM at this point, and we opted to avoid it as it adds quite a bit of machinery which can be hard to maintain for example.

In #1422 I've planned for use of a VM, and after https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/1422#discussion_r331005211 I figured it would make sense to create a dedicated issue to discuss if that makes sense to focus on a VM based development setup.

Why CONTRIBUTING.md should focus on a VM based setup

A all encompassing virtual environment

In Python there are plenty of tools to setup a virtual environment to avoid mixing up Python versions or dependency requirements etc. in various projects. In this repo, we are developing something that make us of a lot of tools, of which many could benefit from a virtual environment: kubectl, helm, kind, python, kubeval. Consider for example the folder ~/.helm/ and ~/.kube and the configuration that is updated with use of kubectl and helm.

A common experience for Windows/MacOS/Linux

I don't think it is sustainable to have instructions for multiple OS, but like this, not only does it become possible but it becomes easier as we would focus entirely on Ubuntu 18.04 in the VM. The only OS split of instructions would reside in setting up the VM.

This makes Windows based contributors first class contributors.

Locally reproducible CI tests

We have CI tests and they are running on Ubuntu. If we want these to be easy to re-execute locally as part of a development process, we must ensure they don't assume anything about the OS etc. With a VM running Ubuntu 18.04 locally, and a TravisCI VM running 18.04, we are in a very good spot to make them re-executable locally without much effort.

Opinionated so others don't have to be

Providing many options on how to do something is not only more complicated, but can also add cognitive load on the person facing the options. I'd like for us to provide clearly recommended option, and not suggest other options unless we can properly maintain their functionality.

A opinionated development environment

In this comment @minrk considers this:

I think we should avoid telling users to run any commands directly in the VM. All commands should ideally be run exclusively in the host environment. Having an ssh session open means folks lose all of the shell configuration they are used to, otherwise they lose history, completion, customization, etc.

But on the other hand, I find it to be a bit of a feature. If we have a VM where things are separated from how it is locally, we can ensure that kubectl autocompletion is setup by default for example. Having these excellent features enabled by default, you get their value without the initial pain you need to experience in order for you to realize you want to invest time to resolve it.

I acknowledge the downside, but I think the upside is worth more.

manics commented 4 years ago

One downside: z2jh is deployed in a multitude of environments, having everyone use identical dev and test environments means bugs may remain hidden until they hit production.

minrk commented 4 years ago

I acknowledge the benefit of having reliable, portable environments, especially for new contributors (vagrant may not be this, since it doesn't work for me to set up a basic VM today). However, I think that the downside of recommending a VM as the way to work outweighs the upside. What makes me uncomfortable recommending a VM for development is that I would never use it, and I have a hard time recommending tools that I would never consider using. Now, it's super useful for Windows folks who have a really hard time setting up an environment, where everything is so different from those of us in posix-land. But it also severely disadvantages anyone who has a nontrivially configured environment (non-bash shell for instance—zsh will be default on macOS next month—or well configured bash, kubectl, etc.). I personally can't stand using kubectl without all my customizations at this point. Completion's not enough.

Note that I'm separating running the test cluster in the VM from actually doing development there. I don't have a problem with running the cluster in a VM (we already recommend this with minikube), only adding the friction of using the VM as a development environment. I don't see any need to set up a VM to run tests, kubectl, and lint, etc. as long the host environment has access to the cluster. minikube basically exists to put a cluster in a VM and expose it to the host. If we decide that vagrant+kind is better, we can do the same, but I think the goal should also be the same: expose the cluster in the VM to the host. Having the commands exist in the VM is fine, as long as they can also be run from the host without manual vagrant ssh.

consideRatio commented 4 years ago

One downside: z2jh is deployed in a multitude of environments, having everyone use identical dev and test environments means bugs may remain hidden until they hit production.

Hmmm, perhaps, but I'm not sure. The actual product we produce is a helm chart. I think the key issue here is not having a production similar flora of Kubernetes clusters, we simply rely on kind which use kubeadm currently.

consideRatio commented 4 years ago

@minrk regarding:

However, I think that the downside of recommending a VM as the way to work outweighs the upside.

I'm very happy to rephrase and suggest there are options to the VM way of development, especially for someone experienced, but currently not happy about trying to maintain instructions for alternative ways of setting local development and testing.

When you wrote the sentence above, did you refer to maintaining instructions only for one way, or suggesting there was only one good way to do things, or both?

minrk commented 4 years ago

When you wrote the sentence above, did you refer to maintaining instructions only for one way, or suggesting there was only one good way to do things, or both?

I think both. If we use kind, I'd like our tooling to work for kind "as advertised" i.e. working on mac/linux/windows. I've no objection to having a vagrant configuration as an available and "recommended if you don't want to think about installation" shortcut to getting up and running, but I think we should support folks working with these tools installed on their system. i.e. separate the vagrant stuff as one version of the "get your environment set up" stage and then the while-you-are-working tasks assume you have a satisfactory environment, but do not assume that it's the VM. Some will assume that you have kind, some only kubernetes+helm, etc.

There are a couple of levels of assumptions about the environment, with trade-offs about how much freedom users have (freedom to make choices, but also freedom to make things not work!)

you have kubernetes, Python, and this repo (this is best and most flexible, giving users the most control, but hard because there are lots of variables for "what is kubernetes, really?" and "how do I build images for my cluster?" varies depending on minikube vs kind, etc.)
you are using kind (eliminates some choices, but helps us narrow down instruction variability, mainly for building images and/or bootstrapping new clusters)
you are using our Vagrant VM (our repo takes a lot of control away from the developer, for better and worse. It is now harder for the contributor to make choices, but simpler to get started)

As much as possible, we should have instructions that are at level one and few that are at level 3.

If the current instructions are separated into "setting up the environment" and "development workflow" where the development workflow does not assume the VM, but works if you've set it up, and you don't want to write docs for getting setup with kind natively, that's okay. I'd be happy to take a stab at the "native" version of getting set up with kind.

To be clear, I'm 100% okay if folks feel that assuming the cluster in the VM is the right approach, I just want to make sure that if we do that it's more like minikube where commands, etc. are issued from the host and not via vagrant ssh. That was my only objection to the VM setup.

manics commented 4 years ago

I'm happy using vagrant at the moment I had hoped kind would make it possible to avoid it, so 👍 if the instructions can be split into:

Setup your dev environment, either using Vagrant (step by step walkthrough) or manually (list prerequisites, no instructions)
Build and test z2jh

consideRatio commented 4 years ago

kind works fine without being in a VM on my Ubuntu at least. Any automation to setup kind/kubectl/helm/kubeval/python3 + dependencies and a safe guard from for using kubectl/helm on the wrong cluster goes away.

I just want to make sure that if we do that it's more like minikube where commands, etc. are issued from the host and not via vagrant ssh.

I think that this means to never leave the native terminal, but instead wrapping vagrant ssh with something whenever it is to be used. I think at this point, then it makes great sense to not use a VM at all. I think a wrapper around in-VM interactions would introduce a lot of magic and complexity that is unsustainable and provides little value.

I'm trying to evaluate what it would mean to support developers from Linux, Mac, and Windows without a VM, where one could run CI tests and debug the cluster after test failures, while minimizing risk of the user making a mistake working on a unrelated cluster, also while not making something too complicated to maintain. I'd like to avoid having much logic specifically for the CI and other specifically for local development.

Letting the user install binaries, Python + libraries

kubectl, helm, kind, kubeval would need to be installed and put on PATH available or in the repo's bin folder.
Install Python (official, anaconda) 3.6+, pip, and a virtual environment tool to install dev-requirements.txt.

My thoughts are still in flux.

consideRatio commented 4 years ago

WIP: Erik's Minimally Viable Instructions

I'm trying to come up with alternative ways that works on multiple platforms, in a customizable way, where it is hard to misuse both kubectl and helm, and easy to get started with, while having something that is sustainable to maintain with the experience that we failed to maintain the latest versions development instructions properly.

Clone repo
(optional) Enter a VM
Install kubectl / helm / kind / kubeval
Install Python, enter a virtual environment, and install dev-requirements.txt
Start a k8s cluster (kind, minikube, custom, ...)
Install in-cluster dependencies
- calico
- tiller
Configure kubectl for use with the k8s cluster
- export KUBECONFIG=<config path> (Mac / Linux)
- set KUBECONFIG=<config path> (Windows?) (ref: offical docs)
- --kubeconfig for both kubectl and helm
- --context for kubectl and --kube-context for helm
- HELM_HOME or --home for helm
- Idea: If we have any script, it could rely on a local .kube/config file that we reference. The KUBECONFIG paths will be used relatively to the working directory when executing kubectl.
Install / Upgrade the local helm chart
- chartpress and --commit-range master..HEAD
- (kind only) load locally built docker images
- helm upgrade
Interact with the deployment as a user
- Port forward to svc/proxy-public
Run some CI tests
- (easy) use the hub api
- (easy) add / remove unique users
- (easy) spawn and stop stop pods
- (medium) use kubectl exec to run code on pods
- (hard) use helm upgrade and go from one version to another
- Note: The easy/medium tests could be run from a test pod within the cluster using helm test in a pod/container. It is a practice followed by the helm/charts repository's charts to have such tests. It would also allow for a verification of functionality after installation for chart users. But, it would also add some complexity... It would be best if we could do both, but it would also be hard to support it all.
- Note: The hard upgrade test should not be run from a helm test pod I think, as they would need to modify its own helm release and things could end up being quite weird. It don't think it is important that this test can be run locally during development though.

Bonus: dependency interaction development

It has been a challenge to develop kubespawner in interaction with Z2JH for example, this could perhaps be figured out properly in the contribution section as well.

Describe a recommended way to locally test a bumped kubespawner, oauthenticator, jupyterhub version.
- Perhaps an init_container could be attached to the hub pod and git install a forked version, then no rebuilds would be needed etc.

Thoughts in flux

Require KUBECONFIG to be explicitly set in any script running kubectl or helm.
Support use of a .gitignored .env file where KUBECONFIG and HELM_HOME can be set, for example to .kube/kind-config-<kind cluster name>, then use python-dotenv from python scripts wanting to use kubectl, optionally like flask does it.
The --commit-range master..HEAD is good if developing on another branch, but not if working directly on master, one could add origin/master though but that would suddenly assume we have a remote named origin that is relevant to compare with, it should preferably be the upstream.
https://github.com/instrumenta/helm-kubeval and https://github.com/instrumenta/helm-conftest could be used with HELM_HOME to avoid modifying things locally.
the VM setup should provide autocompletion for kubectl and helm out of the box

Helper scripts, for... ?

There are some things that becomes a bit much and would benefit from automation, such as using when using kind and rebuilding the images.

installing calico, currently using a bash script with curl/sed/kubectl and assuming a kind k8s cluster
installing tiller

betatim commented 4 years ago

(read below the horizontal line for the two points I want to make, up here for some backstory)

A comment (somewhat from the sidelines because I haven't digested all of the above): in another project I have to start a minishift and deploy to it before I can start development. This is because there are several services that depend on each other and the simplest way to get all of them up and behaving roughly like in prod is minishift + helm deploy. To work on one of the services you have to then use a tool like telepresence to perform "magic" that teleports a local (outside minishift) process into the cluster. Why a local process? Because I want to use my favourite editor to edit code, people want to attach a debugger to the process, writing code+create container+redeploy is too tedious.

Two things I dislike is: minishift takes ~5 minutes to start and having to use telepresence massively increases developer complexity. This means there are strong disincentives in place to do quick fixes. You start minishift and get a coffee then work. telepresence mostly just works but sometimes it does weird stuff. I have to pay the 5min startup penalty every time I start work because I can't leave stuff setup, ready to go (would need a laptop per project).

The first point for this discussion: we should try and work on keeping the time-to-active-developing as low as possible for repeat developers. It isn't just about complexity/number of steps/automation. To make up a hypothetical scenario:

spend 20min as one time setup cost and 30s setups for the repeat case
spend 5min as one time setup cost and 3min setups for the repeat case

As someone who works on the project frequently I prefer to invest 20min one time so that the more frequent action of starting work takes only 30s.

-> one time costs are fine if they reduce repeat developer cost.

The second point: I have spent a lot of time recovering from "let us auto setup stuff for you" scripts that some projects use and advertise as the way to get developing. These scripts are great in achieving the goal of forcing the environment to be compliant to the needs of that particular project. They are also great at breaking the environment for all the other projects I work on. This takes the form of overwriting configuration files, installing tools, modifying environment variables, etc. In theory these are great scripts to have, in practice they find themselves in a carefully tuned environment that isn't quite like what they expected. The result is that they are like a bull in a china shop, not a back country hiker (leave no trace).

-> explicit instructions are better than implicit instructions.

betatim commented 4 years ago

using wrong cluster, configure kubectl to use the right cluster

I use (a minimally modified version of) https://github.com/ahmetb/kubectx to switch between contexts. I think it doesn't create any additional config files and relies mostly on ~/.kube/config. When I start minikube the right thing happens (switch to minikube context). oc login also works and switches context (this is part of OpenShift tooling).

It is great because it manages to co-exist with several (otherwise pretty opinionated) tools. It also means I don't have to specify --context or the like on the command line. To list all pods on the GKE mybinder cluster: kctx binder-prod && kubectl get pods, kctx binder-ovh && kubectl get pods to work on the OVH deployment, kctx minikube && kubectl get pods back on minikube.

One thing that is annoying is that it switches context globally for all your terminals.

A while back Min linked me to his snippets but I've not found time to translate them to ZSH/my weird shell config yet. (which also tells you that the global switch is annoying but not that annoying).

-> maybe adding call outs to the docs at points where we know pain exists or where we have found good helpers to reduce pain is a way forward. Instructions like "Now check the list of pods with kubectl get pods --namespace foo to see that the hub pod is running and well. If not use kubectl logs <hub-pod-name> --namespace foo to look at the logs." could have a call out "To avoid having to type out the namespace each time checkout [this little helper]() for linux and other helper for OSX."

consideRatio commented 4 years ago

@betatim thanks for writing these thoughts down!

I really want to get these local development instructions to be very useful, I want to feel I can recommend friends that this is a project that they can contribute to and learn a lot while doing it.

My summary of what I heard you say in my words:

prioritize the secondary development setup time over the initial development setup time
don't make things magic
don't modify global state
be open about pain points

manics commented 4 years ago

Perhaps another way of approaching this is the types of developer who will contribute:

Devs with little or no k8s and no z2jh experience
Devs with k8s experience but new to z2jh
Experienced z2jh devs

A fully contained VM is very helpful for the first for the first group, but since you're only a K8s beginner once it penalises the greater pool of devs who've worked on other k8s related projects since they need another custom development environment, and penalises the third group for the reasons @betatim mentioned.

I think focussing on 1 and 3 for now is reasonable:

For group one have VM setup instructions that get you to the point where you can connect to k8s from inside the VM (i.e. a fully contained dev environment)
For the third group assume k8s is setup (or provide minimal instructions), and state that a working kubectl and helm are required e.g. by setting KUBECONFIG
The remaining scripts should hopefully be common to both

Then iterate and refine

consideRatio commented 4 years ago

Below is an outline of my current idea of the local-dev instructions summarized. Important for this idea is that:

Anyone should be able to get started quite quickly, even those new to k8s and python.
The instructions should be easy to maintain and keep up to date.
The instructions should allow for advanced and opinionated users to do it the hard way.
The CI tests should work locally as well.

For this to happen, I've come up with an approach, where the process is to follow a single path with A/B options along the way that doesn't influence the future choices. I've also moved away from the use of bash scripts for anything than quick automated installation of binaries for the CI system and the VM, as well as the publish script which is only to be run by either an advanced user or by the CI system anyhow.

Dependencies a) Instructions on how to enter a VM and get all binaries, Python, and libraries setup through a script also in use by our CI system. b) Get all required binaries, Python, and libraries setup on your own.
Kubernetes cluster setup a) ./dev.py kind start [--recreate] - setup and initialize calico etc on any platform (unknowingly if your in a VM) assuming you have required binaries available. b) Manually get a Kubernetes cluster setup and ready with required dependencies.
Installing / Upgrading the local Helm chart and local image dependencies a) ./dev.py upgrade (chartpress, automatic detection if it is a kind cluster and if so also load locally built images, helm upgrade, and kubectl port-forward to the proxy-public service) b) Manually doing the steps of a)
Run tests a) ./dev.py test (run pytest with suitable parameters and output relevant logs if failing) b) Do it manually

Note, if you use manual run, how would ./dev.py upgrade know about your cluster? Well it wouldn't and it would instantly complain that KUBECONFIG hasn't been explicitly set within your .env file which is bootstrapped. This allows for the same cluster to be repeatedly used by the ./dev.py script, and the risk of messing with a production cluster or similar goes down a lot because you must have explicitly asked for a certain KUBECONFIG to be used... Hmmm now that I think of it, you can have different context within the KUBECONFIG as well, so I should make sure that one also specifies the context.