Closed guillaumeeb closed 1 year ago
Yes it is due to the docker image on jupyterlab and Dask worker's image is not the same image. But the warning did not prevent the job run when I tried last time. (and it is a good explanation we can use for tutorial, to show /understand distributed computing)
What was problematic on this configuration last week when I tried for the tutorial are ;
@j34ni or @annefou might have some update on this, but your experience with kubctrl /jupyter hub might help?
I agree that the fact that the Dask Gateway uses a password instead of JupyterHub to authenticate is an issue for the longer term. However I find that it is an advantage for the workshop because we will be able to shutdown clusters left running by participants (or multiple clusters opened by mistake) and hence release resources.
@guillaumeeb: The dashboard link now works with the latest versions of the setup (pangeo-foss4g, for instance) when we also install Grafana
Yes it is due to the docker image on jupyterlab and Dask worker's image is not the same image. But the warning did not prevent the job run when I tried last time. (and it is a good explanation we can use for tutorial, to show /understand distributed computing)
As I said in https://github.com/pangeo-data/foss4g-2022/issues/20#issuecomment-1212222842, I really think at least the images should be the same. Even if in this case versions are sufficiently closed for the Client/Cluster to be working, this really is a bad practice and often can cause unwanted errors.
About the dask-gateway authentication, I concur with @j34ni, this is really not an issue for the workshop, but should be addressed in a longer term.
And if the Dashboard link now works, that's great! @j34ni should I go back to pangeo-foss4g instance to test things?
@j34ni I finally logged in the front VM of pangeo-foss4g deployment. Looking at the values.yaml file produced by the following command:
sudo helm get values daskhub -n daskhub
It looks like dask-gateway is not enabled on this instance, is that correct?
dask-gateway:
enabled: false
gateway:
auth:
simple:
password: pangeo_dask
type: simple
dask-kubernetes:
enabled: true
jupyterhub:
hub:
baseUrl: /jupyterhub/
config:
GenericOAuthenticator:
allowed_groups:
- urn:mace:egi.eu:group:vo.pangeo.eu:role=member#aai.egi.eu
authorize_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/auth
claim_groups_key: eduperson_entitlement
client_id: id
client_secret: secret
login_service: EGI Check-In
oauth_callback_url: https://pangeo-foss4g.vm.fedcloud.eu/jupyterhub/hub/oauth_callback
scope:
- openid
- email
- profile
- eduperson_entitlement
token_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/token
userdata_params:
state: state
userdata_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/userinfo
username_key: preferred_username
JupyterHub:
authenticator_class: generic-oauth
ingress:
annotations:
kubernetes.io/ingress.class: nginx
enabled: true
proxy:
secretToken: hash
singleuser:
cpu:
guarantee: 2
limit: 4
image:
name: pangeo/ml-notebook
tag: latest
memory:
guarantee: 4G
limit: 16G
Would this be possible to make some tests on one instance or the other, or do you prefer to keep things as is? Currently, I don't have access to pangeo-xxlarge platform.
@guillaumeeb: I did not manage to have a working infrastructure with at the same time EGI Check-in, a dask-gateway and increased CPU & memory limits. The values.yaml you produced is what was in my email from Tue 2022-08-09 10:30. Feel free to modify and do as many tests as you want on pangeo-foss4g.
Closing this one as solved.
I thinks this has already been said, but as I'm currently reviewing notebooks on the infrastructure, I just thought I'd open issues to note the problems.
So first, the Dashboard link is not working.
Clicking on the generated Dashboard link, for instance
Dashboard: [/services/dask-gateway/clusters/daskhub.e9bff8eab5134c32a5db353c5655c1f1/status](https://pangeo-xxlarge.vm.fedcloud.eu/services/dask-gateway/clusters/daskhub.e9bff8eab5134c32a5db353c5655c1f1/status)
leads to a 404 error.Connecting a client to the cluster generates a version mismatch:
It's probably because the Docker image used by Jupyterhub for singleuser notebook and dask-gateway for workers is not the same.