pangeo-data / helm-chart

Pangeo helm charts
https://pangeo-data.github.io/helm-chart/
21 stars 26 forks source link

more powerful notebook VM #7

Closed rabernat closed 6 years ago

rabernat commented 6 years ago

I think it would be good to have more RAM and CPU in the main notebook pod. Since that's where the scheduler runs, RAM can become a bottleneck for large dask graphs. Also, the notebook pod has to do more work than the worker pods because it does all the visualization etc.

Would this change accomplish that?

mrocklin commented 6 years ago

It looks like you might also want to set request? http://callmeradical.com/post/k8s-resource-limits-requests-qos/ I'm not sure.

Two other points

  1. I think that our VM's cpu:memory ratio is something like 1:3G . I recommend sticking to this. Leaving CPU on the table at the pod-level doesn't reduce our bill at all.
  2. Currently none of our node pools have VMs large enough to run 16GB pods. I suggest that we increase the size of the nodes in the pre-emptible pool. You should be able to do this in the cloud console.
mrocklin commented 6 years ago

@jacobtomlinson any thoughts on what's going on with the travis test failure on this PR?

rabernat commented 6 years ago

the test failure might have to do with the fact that I just make a quick PR by editing within github (rather than forking / branching / pushing) and therefore created another branch rabernat-patch-1 within this repo.

jacobtomlinson commented 6 years ago

The error comes from chartpress. I wouldn't worry for now. I'm actively looking at updating the Travis stuff on this repo.

While I agree that this makes sense I'm wondering if we shouldn't be updating the defaults here and instead updating the GCE deployment config here.

I guess it depends if you think this change should be made for everyone, or just for pangeo.pydata.org. Happy to defer to you on this!

rabernat commented 6 years ago

I agree it makes sense to update the pangeo.pydata.org specific config, rather than the master template. The problem is that I don’t understand the relationship between these pieces very well.

Sent from my iPhone

On Apr 9, 2018, at 7:01 AM, Jacob Tomlinson notifications@github.com wrote:

The error comes from chartpress. I wouldn't worry for now. I'm actively looking at updating the Travis stuff on this repo.

While I agree that this makes sense I'm wondering if we shouldn't be updating the defaults here and instead updating the GCE deployment config here.

I guess it depends if you think this change should be made for everyone, or just for pangeo.pydata.org. Happy to defer to you on this!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jacobtomlinson commented 6 years ago

That's fair.

This repo is the master template. Perhaps I should update the README to make that clearer.

rabernat commented 6 years ago

What is the relationship between values.yaml in this repo and pangeo/gce/jupyter-config.yaml in the other repo? Does that one somehow "extend" this one?

mrocklin commented 6 years ago

nYes, the config files stack.

On Mon, Apr 9, 2018 at 8:32 AM, Ryan Abernathey notifications@github.com wrote:

What is the relationship between values.yaml in this repo and pangeo/gce/jupyter-config.yaml in the other repo? Does that one somehow "extend" this one?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/helm-chart/pull/7#issuecomment-379752876, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDTkf7Jlj-Sqivo79X9IKwLS7kpFks5tm2LdgaJpZM4TKGGI .