JupyterHub Deployments Using GitOps Tools (FluxCD/ArgoCD)

jash2105 commented 7 months ago

Hello JupyterHub team,

I've been exploring the current documentation and setup processes for JupyterHub on Kubernetes, primarily managed through Helm. This setup works well for basic deployments, but I've noticed a potential gap for large-scale, enterprise-grade deployments.

Many enterprise data science and engineering teams might prefer integrating JupyterHub with existing GitOps workflows, typically managed via FluxCD or ArgoCD, rather than directly using Helm for every change. This approach leverages their existing CI/CD pipelines and enhances maintainability and scalability.

Given this, I propose expanding the documentation to include detailed guidance on integrating JupyterHub with FluxCD and ArgoCD. This enhancement will:

Provide step-by-step instructions on setting up JupyterHub using FluxCD/ArgoCD for resource and configuration reconciliation.
Include practical configurations for a multi-user, highly available JupyterHub environment suitable for enterprise-level deployment, especially those requiring substantial GPU resources.
Offer comprehensive debugging documentation to assist teams in quickly resolving issues.

I believe these additions will significantly streamline the setup process for large teams and institutions, reducing the overhead associated with integrating JupyterHub into large-scale infrastructure.

I am eager to contribute by drafting the documentation and configuration examples. Before proceeding, I'd like to gather feedback on this idea and any specific requirements or suggestions the community or maintainers might have.

Looking forward to your thoughts and hoping to contribute effectively to this amazing project!

welcome[bot] commented 7 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

jash2105 commented 7 months ago

Hey @consideRatio , can I work on this and submit a pr ? I think this would greatly benfit the community , Let me know what you think !

consideRatio commented 7 months ago

Hey @jash2105, thank you for investing your time in this project and JupyterHub ecosystem of open-source software!!

Provide step-by-step instructions on setting up JupyterHub using FluxCD/ArgoCD for resource and configuration reconciliation.

:tada: I think it would be great to provide docs to complement existing docs with details that enable readers to deploy the helm chart with FluxCD or ArgoCD in dedicated sections.

I suspect it makes sense to have separate pages for FluxCD and ArgoCD, but if they require very similar where they share more content than they differ they could live on the same page.

Note that we have some past discussions of relevance about ArgoCD, for example:

This helm chart makes use of lookup function in the chart's templates, but that requires template rendering to be done with interaction against the k8s api-server - but tools like ArgoCD may do it in isolation beforehand. This was clarified in https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/2887#issuecomment-1254894945, where adjustments like https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/2887#issuecomment-1824055403 could be needed.

I'm not sure where to put the docs, but maybe under Installing JupyterHub with ArgoCD under Setup JupyterHub, below: . Alternatively, a section in the administration section about adjusting the deployment to be deployed with ArgoCD instead of helm perhaps?

Include practical configurations for a multi-user, highly available JupyterHub environment suitable for enterprise-level deployment, especially those requiring substantial GPU resources.

I'd appreciate if you focus on for example ArgoCD and/or FluxCD initially. The GPU topic is a complicated topic, so if documentation is to improve with regards to GPU things I'd like such contribution to be isolated and focused without coupling to other pieces. This makes review effort easier and that makes PRs get merged in general.

If there are GPU related notes specific to ArgoCD, I suggest considering those separately as well as a less complicated contribution to help deploy with ArgoCD without GPU is a valuable contribution by itself.

Offer comprehensive debugging documentation to assist teams in quickly resolving issues.

There are some general debugging docs. If there are specific ArgoCD debugging details, they can be part of an ArgoCD section - but otherwise I think we should try to build on the general debugging docs.

Btw if you write for example about ArgoCD, try to be aware about what ArgoCD is already documenting. The more we can link out to their docs to explain something, the easier the docs are to maintain long term as ArgoCD makes changes etc.

jash2105 commented 7 months ago

Absolutely, your overview is very thorough. Here’s my proposed timeline for the documentation process:

Initial Documentation: I plan to start with FluxCD, focusing initially on a straightforward installation guide that covers the basic setup without any custom configurations. This will include detailed steps on how to bootstrap a cluster using GitHub or GitLab with Flux, followed by a basic Helm chart installation. The goal is to establish a minimal viable setup with the necessary pods and services, along with some preliminary debugging steps.
Review and Iteration: Once the initial documentation is complete, I’ll submit it for review. Based on the feedback, I can make any necessary revisions.
Subsequent Documentation: Continuing from there, I'll create additional pull requests to gradually expand our documentation. This will include guides on customizing resources, integrating GPU support, and replicating the setup with Argo.

Does this sequence of steps fit well with our overall strategy? Please let me know if there are any adjustments you’d like me to consider or if there are specific areas you think we should prioritize.

manics commented 7 months ago

Argocd focuses on git-ops style deployments. What do you think about having the instructions, scripts and manifests in your own repository, and linking to them from the Z2JH docs? One challenge with having all docs in a single repo is it's not possible to automatically test them, it can be a pain for people to copy and paste code, and things can therefore easily get out of sync.

What might be particularly nice in a standalone repo is to have live manifests, and perhaps you could even deploy your own Argocd cluster in GitHub CI, and deploy the Z2JH config?

consideRatio commented 7 months ago

Thank you @jash2105 for planning this so clearly!

I didn't expect the "boottrap" part of "bootstrap a cluster using GitHub or GitLab with Flux" - but I may misunderstood you. I expected something like "how to deploy of the jupyterhub chart with Flux" under the assumption flux is already used to deploy things into an existing cluster. Maybe a github repository is required to be setup for this, but not a cluster using Flux?

I'm trying to ensure the scope of what is to be documented is sufficiently related to deploying the jupyterhub chart, because anything introduced in this project - even if its documentation - will require long term attention in its maintenance. If we document too much beyond whats relevant to deploy the jupyterhub chart, the project takes on too much long term maintenance burden.

I realize I can't guide this so clearly because I don't know Flux or Argo, but there should be a line drawn somewhere to focus on how to deploy this chart with Flux/Argo, as compared to how to work with Flux/Argo in general.

jash2105 commented 7 months ago

@consideRatio, I agree with your assessment. Starting with bootstrapping a cluster might indeed be excessive and could shift the focus too heavily onto Flux or similar CD tools. Instead, I propose initiating our efforts by deploying JupyterHub using Flux. This will be covered in my first PR. Subsequent updates can introduce enhancements such as custom deployment configurations, GPU resources, and eventually ArgoCD integration. Since I haven't set up ArgoCD on my cluster yet, we can prioritize Flux in the initial phase and then explore ArgoCD later on. Does this approach sound good to you? If so, you can expect a PR from me within the next few days or the coming week!

And to answer your question , yes, we are not setting up a cluster; we will just be setting up a git repository where we store all our manifests. And if we make any changes , the cluster will automatically recognize that and make those changes to the existing deployment.

jash2105 commented 7 months ago

@manics, are you suggesting that the documentation could potentially cause issues? I wouldn't expect that to be the case. Also, I agree with you about storing the plain manifests in a repository, whether it's mine or another. These manifests could serve as useful references. Moreover, having custom documentation alongside referring directly to the complete manifest could streamline the process, similar to how we handle the documentation and values.yaml file when deploying with Helm.

manics commented 7 months ago

I don't think it'll cause issues, it's more that I think from a maintainability perspective it may be easier to have a separate repo with docs, manifests, and potentially CI workflows combined.

I think it could also be easier for readers too, it's a lot easier to tightly integrate manifests and docs in their own repo since it won't be constrained by the existing docs layout. If someone wants to reproduce your steps they could just clone the repo, this isn't so practical if you have to clone the whole Z2JH repo and search through subdirectories.

jash2105 commented 7 months ago

@manics, I see your point about the issue requiring a fundamental restructuring of the repository. Given this, I propose continuing with the current PR. As we develop the GitOps documentation, if we formulate a plan by then, we could consider a comprehensive overhaul of the existing repositories. Does this sound like a viable approach to you?

jash2105 commented 7 months ago

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/3407 @consideRatio @manics , I worked on a basic install config. Expect more prs incoming with other gitops tools and more configs in the following time to come. Thanks!

DeepCowProductions commented 2 months ago

In case someone needs inspiration for an argocd App definition (should work out of the box):

---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: jupterhub # name of the argocd object
  namespace: argocd # namespace where this manifest lives, not the app it self!
spec:
  project: jupyter # argocd project
  sources:
    # official  helm chart source, values are self hosted
    - repoURL: https://jupyterhub.github.io/helm-chart/
      targetRevision: 4.0.0-0.dev.git.6717.h61ab1167  # helm chart version
      chart: jupyterhub
      helm:
        valueFiles: # supply values from some self hosted repo
        - $values/jupyterhub/helm/values.yaml # path inside self hosted repo
    # self referencing repo to inject values.yaml
    - repoURL: 'https://github.com/org/repo.git'
      targetRevision: main # git branch
      ref: values 
    # extra yamls for additional ressources such as an ingress definition
    - repoURL: 'https://github.com/org/repo.git'
      path: jupyterhub/k8s # path inside repo for other resources
      targetRevision: main # git branch
      directory: # all yaml files inside "jupyterhub/k8s"
        recurse: true
        include: "{*.yaml,*.yml}"
  destination:
    server: 'https://kubernetes.default.svc' # kubernets cluster
    namespace: jupyterhub # deployment namespace for jupyterhub
  syncPolicy:
    syncOptions:
      - CreateNamespace=true # create destination kubernetes namespace
      - ServerSideApply=true # fix for meta data annotation being too long
    automated:
      selfHeal: true # auto sync and repair
      prune: true    # delete ressources after deletion of this manifest 
---

jupyterhub / zero-to-jupyterhub-k8s

JupyterHub Deployments Using GitOps Tools (FluxCD/ArgoCD) #3396