pangeo-data / helm-chart

Pangeo helm charts
https://pangeo-data.github.io/helm-chart/
21 stars 26 forks source link

pangeo.pydata.org did not scale down #58

Closed rabernat closed 5 years ago

rabernat commented 6 years ago

A lot of people logged on to pangeo.pydata.org after my talk at JupyterCon. The cluster size went up to ~450. But it never scaled down: image

I have brought it back down manually now. But I just wanted to document this so we can figure out how to avoid it in the future.

This cost a lot of credits!

mrocklin commented 6 years ago

We should consider setting a lower maximum size until we have things settled.

On Sat, Aug 25, 2018 at 10:42 PM, Ryan Abernathey notifications@github.com wrote:

A lot of people logged on to pangeo.pydata.org after Jupyterhub. The cluster size went up to ~450. But it never scaled down: [image: image] https://user-images.githubusercontent.com/1197350/44624280-08ce8080-a8b8-11e8-984c-1bb891f6458c.png

I have brought it back down manually now. But I just wanted to document this so we can figure out how to avoid it in the future.

This cost a lot of credits!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/issues/58, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszCl-NmCrm4rlWnuAeZbziEJHAoOqks5uUgsWgaJpZM4WMnVT .

rabernat commented 6 years ago

For those who are interested, here is our daily cloud bill, split into compute and storage:

image

mrocklin commented 6 years ago

Thank you for putting this together @rabernat

On Mon, Aug 27, 2018 at 11:26 AM, Ryan Abernathey notifications@github.com wrote:

For those who are interested, here is our daily cloud bill, split into compute and storage:

[image: image] https://user-images.githubusercontent.com/1197350/44668655-ff463500-a9eb-11e8-8adc-c222fa235afc.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/issues/58#issuecomment-416264941, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszHvbVIBDCgUAMw4W-l3ZwrywE-epks5uVA-egaJpZM4WMnVT .

rabernat commented 6 years ago

I didn't do much other than click a few buttons on the google cloud console. For some reason, I think only I am authorized to see the full details of our billing.

jhamman commented 6 years ago

@yuvipanda was recently mentioning some ongoing work within JupyterHub to rework the kubernetes scaling protocols to make scale down more efficient. Perhaps he can point us to that work so we can follow along?

yuvipanda commented 6 years ago

@consideRatio is the person doing most of that work, @jhamman. I think it got merged very recently, he should be able to help.

consideRatio commented 6 years ago

@jhamman I just wrote some about the deployment @minrk did today on mybinder.org. He enabled the freshly merged and foundational building block in a series of improvements to the scheduling and autoscaling of the zero-to-jupyterhub-k8s chart.

See https://github.com/pangeo-data/pangeo/issues/322#issuecomment-419286887

minrk commented 6 years ago

I can't overstate how cool the new scheduler is. Our first 24 hours with it and it successfully scaled down from 5 to 2 nodes. We shipped 0.7 last week, and I'm already more excited about getting our 0.8 release ready!

The manual solution that's been the only feasible scale-down is to cordon nodes that you don't need anymore and wait for them to drain and be reclaimed by the cluster autoscaler. They may still need manual draining to get pods other than users off of them that prevent scale-down (e.g. a miscellaneous kube-dns pod can show up).

We have work to do on autoscaling, especially documenting all the strategies and caveats and helpers.

jhamman commented 6 years ago

Thanks @yuvipanda / @consideRatio / @minrk. This all seems quite promising. If we wanted to try this out in the near term, what would be the best way for us to do that. @minrk - can you point the config you are using for mybinder.org?

minrk commented 5 years ago

mybinder.org's deployment config is at https://github.com/jupyterhub/mybinder.org-deploy, the relevant bit here:

scheduling:
  userScheduler:
    enabled: true
    replicas: 2

This feature is only in the 0.8 dev versions of the chart at the moment. If you are upgrading from 0.6, make sure to test out a deploy/upgrade. As it stands right now, 0.6->0.7 chart upgrades seem to require relaunching users, so performing the upgrade would be a significant disruption.

consideRatio commented 5 years ago

@jhamman we are soon releasing version 0.8 of the z2jh helm chart, and I'm currently writing documentation about it.

guillaumeeb commented 5 years ago

I propose to close this issue, as the scaling down issue seems identified within z2jh or GKE community, and instead open a "Update to z2jh 0.8 helm chart" one. It seems, as @consideRatio informed us, that the release is imminent (https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1054).

jhamman commented 5 years ago

We now depend on version 0.8 of the z2jh chart. Do we want to add the scheduling optimization bit to the pangeo chart so its on by default?

guillaumeeb commented 5 years ago

@jhamman I guess so, @minrk why is it not included in the jupyterhub chart by default?

minrk commented 5 years ago

We made it opt-in as a new, somewhat experimental feature that increases the resource requirements a bit. That said, we've been using it on mybinder.org for several months to great effect. I think we will probably switch it to on-by-default in the next major chart release.

consideRatio commented 5 years ago

@jhamman I think it is quite complex to decide on good default values. If you are using an autoscaling cluster or not, as well as if you are using GKE or another cluster with certain settings relating to the cluster autoscaler.

Some of the complexities

About not having the user scheduler on by default: It will pack the user pods tight on nodes, but that does not make sense if you have a fixed number of nodes, but it does if you use autoscaling. So, what the default should be is not critical but also not obvious.

About podPriority / userPlaceholder pods: These changes only makes sense for autoscaling clusters as well, if that is configured or not is out of scope for the helm chart. The cluster autoscalers setting called "pod priority cutoff" deciding on a lower limit of a required pod priority of a pending pod to trigger scale up, is not fixed. Some clusters may need to adjust their podPriority settings based on this.

jhamman commented 5 years ago

I've opened #87 with what I see as sensible defaults for typical pangeo applications. I think we can assume basically all pangeo applications will require autoscaling clusters.