nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
279 stars 91 forks source link

[ENH] - Add Argo Workflows integration to Qhub #1230

Closed Adam-D-Lewis closed 2 years ago

Adam-D-Lewis commented 2 years ago

This issue will document the progress/choices made in integrating Argo Workflows with QHub.

Adam-D-Lewis commented 2 years ago

I've read these resources:

Some of my takeaways are:

There are further additions that we may want to add eventually like artifact repository, adding a database to offload large workflows (https://argoproj.github.io/argo-workflows/offloading-large-workflows/), but we can get to it later.

costrouc commented 2 years ago

@Adam-D-Lewis great points.

I think we should do a namespace or managed namespace install

Totally agree. In fact I would have liked to do this with other qhub resources as well. For example the jupyterlab and dask workers being launched for users. There isn't much reason they need to be in the same namespace as other pods. I think argo would be a good test of this.

The docs also say that using emissary executor + containerset is more secure than using a Doc...

I don't know enough about this. But I'd guess that after we have a stable argo deployment we could investigate this. We should wait to implement this if it adds any complexity for our deployment.

TLS is disabled by default on Argo Workflows Helm chart. I imagine we should enable this at some point, but I think it'd be better to get it running without it first.

I'd like all our components to communicate over tls.. but I don't know exactly how to do this. I agree this would be nice to have but not something we can implement in the short term.

Argo Server supports SSO. Can and should we use Keycloak to enable users to log into the web UI? The helm options for configuration are below:

Yes we should absolutely do this in our deployment and configure the roles/groups appropriately. I think this is essential to the complete deployment.

Eventually, we'll want to enable Prometheus to collect data from Argo Server. There are some helm chart options around this as well.

Yeup we should enable this and I think this is part of the complete deployment.

Adam-D-Lewis commented 2 years ago

I think there are many different points at which we could call this "done". I think the minimum done does the following:

Minimum addition of Argo Workflows

I think we should add Argo in the following order. Minimum done is Step 1

Step 1

Step 2

Step 3

iameskild commented 2 years ago

Thanks a lot for putting this together @Adam-D-Lewis! This gives us a clear target to aim for.

I do have a question around how we would go about "Can deploy Rich's sample workflows in the UI", does this not depend on having some user-facing solution in place? Jupyterflow, Yason etc?

Adam-D-Lewis commented 2 years ago

@iameskild

I do have a question around how we would go about "Can deploy Rich's sample workflows in the UI", does this not depend on having some user-facing solution in place? Jupyterflow, Yason etc?

Good point, maybe we need to just use dummy workflows (long running and scheduled) for now. We'd run his workflows later when we get to adding a front end.

Adam-D-Lewis commented 2 years ago

@iameskild and I are getting reprioritized on another issue, but we hope to get this to a point where it's added into Qhub. However, because there is a blocker on the Prometheus integration above, I suggest we finish integrating the argo server deployment with keycloak's permissions, add some rudimentary docs and try to merge leaving Prometheus and step 2 and 3 above for future work.

trallard commented 2 years ago

AFAIK this is now enable (at least the "backend" or the orchestrator) does this issue need closing?

iameskild commented 2 years ago

I'm fine with closing this. @viniciusdc and I did some preliminary research for the "front-end" interface. We can open another issue to track that discussion 👍

iameskild commented 2 years ago

Closing as completed. Opening #1495 to track front-end proposals.