BUG - Can't submit Argo Workflow via Hera on Nebari 2023.7.1

Adam-D-Lewis commented 1 year ago

The following script throws an error.

import os

from hera.shared import global_config

os.environ['ARGO_HTTP1'] = "true"

os.environ['ARGO_SECURE'] = "true"

os.environ['KUBECONFIG'] = '/dev/null'

os.environ['HERA_TOKEN'] = os.environ['ARGO_TOKEN'] 
os.environ['GLOBAL_CONFIG_HOST'] = f"https://{os.environ['ARGO_SERVER'].rsplit(':')[0]}{os.environ['ARGO_BASE_HREF']}/"  # trailing slash required for proper urljoin inside of hera
os.environ['GLOBAL_CONFIG_NAMESPACE'] = os.environ['ARGO_NAMESPACE']

global_config.host = os.environ['GLOBAL_CONFIG_HOST']
global_config.token = os.environ['HERA_TOKEN'] 

from hera.workflows import Steps, Workflow, script

@script()
def echo(message: str):
    print(message)

with Workflow(
    generate_name="hello-world-hera-",
    entrypoint="steps",
    namespace=os.environ['GLOBAL_CONFIG_NAMESPACE'],
) as w:
    with Steps(name="steps"):
        echo(arguments={"message": "Hello world!"})

w.create()

The error thrown is

BadRequest: Server returned status code 400 with message: `admission webhook "wf-validating-admission-controller.dev.svc" denied the request: An internal error occurred in nebari-workflow-controller while mutating the workflow.  Please open an issue at https://github.com/nebari-dev/nebari-workflow-controller/issues.  The error was: Traceback (most recent call last):
  File "/opt/conda/envs/default/lib/python3.10/site-packages/nebari_workflow_controller/utils.py", line 136, in get_keycloak_uid_username
    keycloak_username = kcadm.get_user(keycloak_uid)["username"]
  File "/opt/conda/envs/default/lib/python3.10/site-packages/keycloak/keycloak_admin.py", line 868, in get_user
    return raise_error_from_response(data_raw, KeycloakGetError)
  File "/opt/conda/envs/default/lib/python3.10/site-packages/keycloak/exceptions.py", line 192, in raise_error_from_response
    raise error(
keycloak.exceptions.KeycloakGetError: 404: b'{"error":"User not found"}'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/default/lib/python3.10/site-packages/nebari_workflow_controller/app.py", line 47, in validate
    keycloak_user = get_keycloak_user(request)
  File "/opt/conda/envs/default/lib/python3.10/site-packages/nebari_workflow_controller/utils.py", line 104, in get_keycloak_user
    keycloak_uid, keycloak_username = get_keycloak_uid_username(
  File "/opt/conda/envs/default/lib/python3.10/site-packages/nebari_workflow_controller/utils.py", line 142, in get_keycloak_uid_username
    preferred_username = workflow["metadata"]["labels"][
KeyError: 'workflows.argoproj.io/creator-preferred-username'

Unwrangling hera a bit via

import json
import yaml
json_str = w.build().json(exclude_none=True, by_alias=True, exclude_unset=True, exclude_defaults=True)
print(yaml.dump(json.loads(json_str), indent=4))

shows me that what is submitted is

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
    generateName: hello-world-hera-
    namespace: dev
spec:
    entrypoint: steps
    templates:
    -   name: steps
        steps:
        -   -   arguments:
                    parameters:
                    -   name: message
                        value: Hello world!
                name: echo
                template: echo
    -   inputs:
            parameters:
            -   name: message
        name: echo
        script:
            command:
            - python
            image: python:3.8
            source: 'import os

                import sys

                sys.path.append(os.getcwd())

                import json

                try: message = json.loads(r''''''{{inputs.parameters.message}}'''''')

                except: message = r''''''{{inputs.parameters.message}}''''''

                print(message)'

Adam-D-Lewis commented 1 year ago

The error indicates that the user wasn't found, which is expected, but NWC assumes that the "workflows.argoproj.io/creator-preferred-username" is set on the workflow which it seems is not the case. At a minimum, we should check if that label is there and return a better error message if not. However, NWC should support workflows being submitted via Hera so that needs to be corrected. I can't access the NWC logs to see what NWC is getting passed as an input however (Argo Workflows adds some labels so I want to see the workflow after that happens before I can debug).

Adam-D-Lewis commented 1 year ago

Workaround for the time being is to disable Nebari Workflow Controller in the nebari config yaml.

argo_workflows:
  enabled: true
  nebari_workflow_controller:
    enabled: false

kcpevey commented 1 year ago

I've confirmed that the error goes away and hera/argo is functional after disabling NWC

iameskild commented 1 year ago

As I see it, there are at least three ways ways of submitting Argo-Workflows now:

via the Argo UI
via Argo-Jupyter-Scheduler (argo details are hidden from the end user)
via Hera-Workflows

Submitting workflows via Hera-Workflows has always required the user to copy and paste their ARGO_TOKEN from the Argo UI. The difference now is that the user's JupyterLab server already has an ARGO_TOKEN set based on which keycloak group they are a member of (analyst, developer, admin). The "default" ARGO_TOKEN needs to be replaced by your personal ARGO_TOKEN and things should work:

import os
from urllib.parse import urljoin

from hera.workflows import Workflow, script
from hera.shared import global_config

def authenticate():
    namespace = os.environ["ARGO_NAMESPACE"]
    if not namespace:
        namespace = "dev"

    token = "Bearer v2:ey....."  # <-- copied from Argo UI

    if token.startswith("Bearer"):
        token = token.split(" ")[-1]

    base_href = os.environ["ARGO_BASE_HREF"]
    if not base_href.endswith("/"):
        base_href += "/"

    server = f"https://{os.environ['ARGO_SERVER']}"
    host = urljoin(server, base_href)

    global_config.host = host
    global_config.token = token
    global_config.namespace = namespace

    return global_config

authenticate()

with Workflow(
    generate_name="hello-world-",
    entrypoint="hello",
    arguments={"s": "world"},
) as w:
    hello()

w.create()

One workaround that doesn't require you to copy over your ARGO_TOKEN would be explicitly set your creator-preferred-username label on the workflow as follows:

import os
from urllib.parse import urljoin

from hera.workflows import Workflow, script
from hera.shared import global_config

def sanitize_label(s: str) -> str:
    s = s.lower()
    pattern = r"[^A-Za-z0-9]"
    return re.sub(pattern, lambda x: "-" + hex(ord(x.group()))[2:], s)

def authenticate():
    namespace = os.environ["ARGO_NAMESPACE"]
    if not namespace:
        namespace = "dev"

    token = os.environ["ARGO_TOKEN"]
    if token.startswith("Bearer"):
        token = token.split(" ")[-1]

    base_href = os.environ["ARGO_BASE_HREF"]
    if not base_href.endswith("/"):
        base_href += "/"

    server = f"https://{os.environ['ARGO_SERVER']}"
    host = urljoin(server, base_href)

    global_config.host = host
    global_config.token = token
    global_config.namespace = namespace

    return global_config

authenticate()

labels = {
    "workflows.argoproj.io/creator-preferred-username": sanitize_label("eeriksen@quansight.com")
}

with Workflow(
    generate_name="hello-world-",
    entrypoint="hello",
    arguments={"s": "world"},
    labels=labels,
) as w:
    hello()

w.create()

The long-term solution is to generate personalized Argo tokens for each user and add them to as env vars on the user's JupyterLab pod. This has been captured in this issue.

Adam-D-Lewis commented 1 year ago

One workaround that doesn't require you to copy over your ARGO_TOKEN would be explicitly set your creator-preferred-username label on the workflow as follows:

Allowing users to set their own creator-preferred-username is a vulnerability since now users can claim to be any user they want and have those files mounted. I'll open an issue to correct that.

nebari-dev / nebari-workflow-controller

BUG - Can't submit Argo Workflow via Hera on Nebari 2023.7.1 #19