Open guillaumeeb opened 1 year ago
Thanks for the review, so I need to dig a bit deeper after @micafer answer:
There are some pods that are created using a K8s typed called DaemonSet, in this case there will be one pod deployed in each available node. CLUES ignores this pods to mark a node as "used", so in nodes 2 and 3 there will be some other pods that CLUES cannot ignore. So you can try to "pack" the pods into one node, using the comands "kubectl drain" and "kubectl cordon" to free the nodes.
Do you have code/notebook with the workload? So I can rerun on my end and check whether I can help as well. Thanks!
Sure, I'm just using the notebook from this repo: import package part and then just jump to Setup Dask gateway cluster section.
Just use some bigger number for Dask worker memory, and scale a bit more:
cluster = gateway.new_cluster(worker_memory=8, worker_cores=2)
cluster.scale(18)
Pending some questions to @micafer.