pangeo-data / pangeo-eosc

Pangeo for the European Open Science cloud
https://pangeo-data.github.io/pangeo-eosc/
MIT License
3 stars 3 forks source link

Document elastic cluster scaling down #41

Open guillaumeeb opened 1 year ago

guillaumeeb commented 1 year ago

Pending some questions to @micafer.

guillaumeeb commented 1 year ago

Thanks for the review, so I need to dig a bit deeper after @micafer answer:

There are some pods that are created using a K8s typed called DaemonSet, in this case there will be one pod deployed in each available node. CLUES ignores this pods to mark a node as "used", so in nodes 2 and 3 there will be some other pods that CLUES cannot ignore. So you can try to "pack" the pods into one node, using the comands "kubectl drain" and "kubectl cordon" to free the nodes.

sebastian-luna-valero commented 1 year ago

Do you have code/notebook with the workload? So I can rerun on my end and check whether I can help as well. Thanks!

guillaumeeb commented 1 year ago

Sure, I'm just using the notebook from this repo: import package part and then just jump to Setup Dask gateway cluster section.

Just use some bigger number for Dask worker memory, and scale a bit more:

cluster = gateway.new_cluster(worker_memory=8, worker_cores=2)
cluster.scale(18)