robusta-dev / robusta

Kubernetes observability and automation, with an awesome Prometheus integration
https://home.robusta.dev/
MIT License
2.5k stars 247 forks source link

Add options to configure nodeSelector and tolerasions for KRR pod on k8s #1442

Closed AlexLov closed 1 month ago

AlexLov commented 1 month ago

Is your feature request related to a problem? I have KRR pods often killed by OOM in some big clusters (like 3000+ pods) while I can adjust memory request/limit of that pod it also starts on quite packed nodes dedicated for main workload and this adjustments to memory might interfere with it. For some side workloads like monitoring and related staff (like robusta) I have dedicated nodes with enough resources so they won't interfere with main workload even if they consume all the node's resources. I use nodeSelectors and tolerations to run all my services on these dedicated nodes and prevent main cluster's workload to be scheduled there.

Describe the solution you'd like Please add options to configure nodeSelector and tolerations for KRR job or at least let them to be taken from robusta-runner pod itself.

Describe alternatives you've considered There are none. I didn't find how to disable KRR pod to be run at all either.

github-actions[bot] commented 1 month ago

Hi 👋, thanks for opening an issue! Please note, it may take some time for us to respond, but we'll get back to you as soon as we can!

aantn commented 1 month ago

Hi @AlexLov, Do the instructions here work for you? https://docs.robusta.dev/master/playbook-reference/actions/scans.html#taints-tolerations-and-nodeselectors

AlexLov commented 1 month ago

Oh, I somehow overlooked this page :( Sure it should do the trick for me. Sorry for inconvenience.

aantn commented 1 month ago

All good! Any idea where you looked in the docs/github? I'll make sure we add a link so it is more discoverable.

AlexLov commented 1 month ago

I looked first into values.yaml of the chart and then in code directly. I checked the docs awhile ago and haven't seen this page (or just didn't go that deep then). Maybe placing the page above * Troubleshooting pages in the list would help it to be more visible. For me these troubleshooting pages and anything beyond kinda advanced stuff that needed only occasionally so no need to dig deep until really needed.