pangeo-data / pangeo-cloud-federation

Deployment automation for Pangeo JupyterHubs on AWS, Google, and Azure
https://pangeo.io/cloud.html
58 stars 32 forks source link

ocean.pangeo.io maintenance hack session #622

Closed rabernat closed 4 years ago

rabernat commented 4 years ago

As discussed in #616 and https://discourse.pangeo.io/t/migration-of-ocean-pangeo-io-user-accounts/644/15, we will be doing maintenance on ocean.pangeo.io and other GCP clusters next week. @jhamman and I have blocked off Monday, June 22, 2-5pm EDT for a sprint on this. I invite everyone, and in particular @TomAugspurger, @scottyhq, @salvis2, @consideRatio, and @yuvipanda to help us out with this.

Some of the things we need to do are:

What am I missing from this list?

rabernat commented 4 years ago

Tomorrow morning I plan to send an email to the users of the new cluster to let them know it's on.

TomAugspurger commented 4 years ago

@jhamman do you know what's left to do for getting things hooked up to hubploy?

jhamman commented 4 years ago

I think we just need to:

rabernat commented 4 years ago

I'm about to push a big update to pangeo.io with documentation about the new setup.

jhamman commented 4 years ago

@TomAugspurger - any idea what is up with these Pending pods:

$ kubectl get pod -n prod | grep Pending
us-central1b-prod-prometheus-node-exporter-2n97g                  0/1     Pending   0          42h
us-central1b-prod-prometheus-node-exporter-dw689                  0/1     Pending   0          42h
us-central1b-prod-prometheus-node-exporter-j42ms                  0/1     Pending   0          42h
us-central1b-prod-prometheus-node-exporter-wnsjv                  0/1     Pending   0          42h
TomAugspurger commented 4 years ago

Not sure. Probably safe to just delete?

jhamman commented 4 years ago

Not sure. Probably safe to just delete?

tried that. they just come back in the same state.

rabernat commented 4 years ago

See https://github.com/pangeo-data/pangeo/pull/780 for documentation update. I'd appreciate a review there.

rabernat commented 4 years ago

Another question: the dask widget is still set up to launch kubeclusters. I think we should not allow kubecluster on the new cluster. So what do we do about the widget? Can we make it launch dask_gateway clusters?

TomAugspurger commented 4 years ago

I believe that's coming from the dask_config.yaml that's baked into the docker images at https://github.com/pangeo-data/pangeo-docker-images/blob/6ba7997b5246440c0f1b92512cb133b98c6b976d/base-image/dask_config.yml#L58-L63. Just switching that to dask-gateway won't work out of the box since the labextension is only set up to create a cluster like class(*args, **kwargs). But dask-gateway needs to create the intermediate Gateway object.

rabernat commented 4 years ago

But dask-gateway needs to create the intermediate Gateway object.

So we need to open an issue in dask-labextension?

TomAugspurger commented 4 years ago

Opened https://github.com/dask/dask-labextension/issues/135

On Mon, Jun 29, 2020 at 1:12 PM Ryan Abernathey notifications@github.com wrote:

But dask-gateway needs to create the intermediate Gateway object.

So we need to open an issue in dask-labextension?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo-cloud-federation/issues/622#issuecomment-651278777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIRRBLG4DDNOOAWU6FLRZDKRNANCNFSM4OA475YA .

rabernat commented 4 years ago

Thanks for your work everyone! The new cluster is launched.

Whenever you get time @TomAugspurger, I would love if you could explain to me how to use grafana / prometheus to gather the information I need about usage.

alimanfoo commented 3 years ago

Hi pangeo folks, apologies for stalking but found this issue while googling for whether there was some way to configure storage quotas when using NFS on GCP. If anyone found a solution to that I'd be very grateful for a pointer.