nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
273 stars 88 forks source link

[BUG] - Argo scheduled jobs fail with 403 permission error #2388

Open krassowski opened 4 months ago

krassowski commented 4 months ago

Describe the bug

image

 Server returned status code 403 with message: `workflows.argoproj.io is forbidden: User "system:serviceaccount:dev:argo-viewer" cannot create resource "workflows" in API group "argoproj.io" in the namespace "dev"`

Expected behavior

Runs smoothly

OS and architecture in which you are running Nebari

v2024.3.3

How to Reproduce the problem?

Run a notebook via scheduler

Command output

No response

Versions and dependencies used.

No response

Compute environment

None

Integrations

No response

Anything else?

No response

viniciusdc commented 4 months ago

Hi @krassowski is this happening to all your scheduled jobs?

kalpanachinnappan commented 2 months ago

Our team is running into the same error as well (Nebari v2024.5.1).

krassowski commented 2 months ago

@viniciusdc I tested two deployments with v2024.6.1rc3, one one this is happening on another it is not. The good news is it is happening on our internal dev deployment so it should be easy to troubleshoot.

krassowski commented 2 months ago

It looks like users with analyst group only have the viewer privileges for workflows:

https://github.com/nebari-dev/nebari/blob/e997de84735b9b6eff8ea7323e979a76c6e56527/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/argo-workflows/main.tf#L89-L93

Indeed on the deployment on which everything works I was testing using an admin account, on deployment where it did not I had only a user account.

krassowski commented 1 month ago

Summarising some offline discussions because I am not sure if everyone is aware of the multiple threads this was discussed in:

  1. @viniciusdc mentioned that this is the expected behaviour (that analysts cannot schedule notebooks)
  2. we agreed that in that case we need to improve the error message in argo_jupyter_scheduler
  3. separately @dharhas suggested to open an issue to think through a better permission role
dharhas commented 1 month ago

The idea of the analyst and developer roles and what they each have access to is from an early clients usage of qhub and isn't very good in practice.

This is what I meant by we need to think through the permissions.