nebari-dev / governance

✨ Governance-related work for Nebari-dev
BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

RFD - User Friendly Method for Jupyter users to run an Argo Workflow [Draft] #33

Closed Adam-D-Lewis closed 1 year ago

Adam-D-Lewis commented 1 year ago
Status Draft 🚧
Author(s) Adam-D-Lewis
Date Created 02-03-2023
Date Last updated 02-03-2023
Decision deadline ?

This is very much a Draft but I welcome feedback already if you want.

User Friendly Method for Jupyter users to run an Argo Workflow (Draft)

Summary

The current method of running Argo workflows from with Jupyterlab is not particularly user friendly. We'd like to have a beginner friendly way of running simple Argo Workflows even if this method has limitations making it not appropriate for more complex/large workflows.

User benefit

Many users have asked for ways to run/schedule workflows. This would fill many of those needs.

Design Proposal

  1. Users would need to create a conda environment (or add a new default base environment - argo_workflows) that has python, python-kubernetes, argo-workflows, and hera-workflows packages.
  2. We pass in some needed pod spec (image, container, initContainers, volumes, securityContext) into the pod as environment variables. We do this via a KubeSpawner traitlet.
  3. Enable --auth-mode=client on Argo Workflows in addition to --auth-mode=sso. Then when users log in, kubespawner should map them to a service account consistent with their argo permissions, and set auto_mount_service_token to True in kubespawner as well. Example according to chatgpt is below though idk if it's hallucinating. Details around authentication via Jupyter vs Keycloak is still a bit hazy to me.
    
    from kubespawner import KubeSpawner
    import json

class MySpawner(KubeSpawner): def pre_spawn_start(self, user, spawner_options):

Get the JWT token from the authentication server

    token = self.user_options.get('token', {}).get('id_token', '')

    # Decode the JWT token to obtain the OIDC claims
    decoded_token = json.loads(self.api.jwt.decode(token)['payload'])

    # Extract the OIDC groups from the claims
    groups = decoded_token.get('groups', [])

    # Modify the notebook server configuration based on the OIDC groups
    if 'group1' in groups:
        self.user_options['profile'] = 'group1_profile'

    # Call the parent pre_spawn_start method to perform any additional modifications
    super().pre_spawn_start(user, spawner_options)

4. Users with permissions can then submit argo workflows since /var/run/secrets/kubernetes.io/serviceaccount/token has the token to be able to submit workflows.
3. Write a new library (nebari_workflows) with usage like 
```python
import nebari_workflows as wf
from wf.hera import Task, Workflow, set_global_host, set_global_token, set_global_verify_ssl, GlobalConfig, get_global_verify_ssl

# maybe make a widget like the dask cluster one
wf.settings(
  conda_environment='',  # uses same as user submitting it by default
  instance_type='',  # uses same as user submitting it by default
)

with Workflow("two-tasks") as w:  # this uses a service with the global token and host
    Task("a", p, [{"m": "hello"}], node_selectors={"beta.kubernetes.io/instance-type": "n1-standard-4"})
    Task("b", p, [{"m": "hello"}], node_selectors={"beta.kubernetes.io/instance-type": "n1-standard-8"})

wf.submit(w)

Alternatives or approaches considered (if any)

Here

Best practices

User impact

Unresolved questions

Here's what I've done so far

  1. Created a conda environment that has python, python-kubernetes, argo-workflows, and hera-workflows packages.
  2. Added a role (get pod permissions) and role binding to the default service account in dev
  3. Change instance type profile to automount credentials for all users so they get the get_pod permissions
  4. Copied all the image, container, initContainers, volumes, securityContext in 2 places, resources, and HOME env var from the pod spec and put them in an argo workflow (think jinja to insert them in the right places)
  5. Copied the ARGO_TOKEN and other env vars from Argo Server UI and sourced them in a jupyterlab terminal.
  6. Ran a short script using argo_workflows python API to submit the workflow. It has access to the user conda environments conda run -n myEnv and all the user directory and shared directories.
    1. the process started in / instead of at HOME, not sure why yet
    2. I ran ["conda", "run", "-n", "nebari-git-dask", "python", "/home/ad/dask_version.py"]
    3. I read and wrote a file to the user's home directory successfully

So deviations from that are still untested.

Adam-D-Lewis commented 1 year ago

I think we should forego the argo sso RBAC and give all users the ability to submit an argo workflow in the short term. We can try to re-add the capability where only developers and admins can submit argo workflows in the future.

Adam-D-Lewis commented 1 year ago

I'll close this since there is no activity and I've implemented a solution with nebari-workflow-controller.