hashicorp / nomad-autoscaler

Nomad Autoscaler brings autoscaling to your Nomad workloads.
Mozilla Public License 2.0
424 stars 84 forks source link

Support multiple namespaces #65

Open lgfa29 opened 4 years ago

lgfa29 commented 4 years ago

Currently each agent can only monitor jobs and policies from a single namespace. For facilitate operating the Autoscaler in environments with multiple namespaces the agent configuration should accept a list of namespaces, potentially with support for wildcards.

pop commented 2 years ago

I would really like this feature. I'm in a multi-namespace environment and this would simplify things quite a bit.

javier-diaz-herrera commented 1 year ago

Currently, in our project, we need to specify several namespaces on the autoscaler. Could you tell us if you are going to develop this functionality? It would be very useful to us.

javier-diaz-herrera commented 1 year ago

Could you tell us if you will develop this feature?

jorgemarey commented 1 year ago

Hi,

We are using nomad-autoscaler v0.3.7 and using this configuration the autoscaler gets the scaling policies for all namespaces. I don't know if this works for some of you.

nomad { .... namespace = "*" }

javier-diaz-herrera commented 1 year ago

Thanks for the help @jorgemarey !! :)

We are using nomad-autoscaler v0.3.7 as well and use de apm-nomad.

The autoscaler shows the following logs:

The autoscaler is able to retrieve the autoscaling policies for each namespace:

2023-07-04T11:14:18.871Z [DEBUG] internal_plugin.nomad-apm: expanded query: from=taskgroup_avg_memory-allocated/builds/builds to="&plugin.taskGroupQuery{metric:"memory-allocated", job:"builds", group:"builds", operation:"avg"}"

but it is not able to retrieve the job metrics:

2023-07-04T11:14:18.876Z [WARN] policy_eval.worker: failed to run check: id=72abd79a-ed54-a6cb-a597-befb37f7ae7e policy_id=72e939b6-e602-b500-2176-99b09d636536 queue=horizontal target=nomad-target check=memory-usage on_error="" on_check_error="" error="failed to query source: failed to get total alloacted memory for taskgroup: failed to get info for job: Unexpected response code: 404 (job not found)"

Any idea?

qk4l commented 9 months ago

Hi could any one provide current status of support?

Based documentation autoscaller does not support several namespaces.

The Nomad Autoscaler currently has limited support for Nomad Namespaces. The nomad configuration below supports specifying a namespace; if configured with a namespace, the Autoscaler will retrieve scaling policies and perform autoscaling only for jobs in that namespace. A future version will include support for multiple namespaces.

lgfa29 commented 8 months ago

Hi @javier-diaz-herrera 👋

Apologies for the delay. This happens because the Nomad APM plugin was not setting the job namespace when querying for metrics so I've open #808 to fix this.

Hi @qk4l 👋

The documentation is still current and each Nomad Autoscaler agent can only target a single namespace. @jorgemarey suggestion works to use all namespaces, and I opened https://github.com/hashicorp/nomad/pull/19547 to document it.

Thank you @jorgemarey for the suggestion 🙂

schmichael commented 2 months ago

Was chatting with @jrasell about this issue internally, and he pointed out we do implicitly support multiple namespaces today via wildcards and ACL policies:

We list scaling policies from Nomad using the wildcard namespace operator, then we then grab the namespace from the policy target block which is returned. This is then used internally when doing anything related to that policy when interacting with Nomad, such as calling the scale endpoint. I've just tested this to confirm; so to clarify:

  • running with -nomad-namespace="*" will query all namespaces for scaling policies, and handle them correctly and is able to scale task groups across all namespaces
  • you could run the autoscaler with an ACL policy which limits what it can see when using the wildcard namespace flag

That being said there are performance and reliability benefits to minimizing the number of namespaces each autoscaler instance handles.

I think adding explicit multi-namespace support is a good idea to reduce toil while also allow easily adjusting autoscaler instances to balance performance and reliability concerns.

That being said please reply or leave a :+1: reaction if wildcard support helps! Prioritizing feature development is tricky and feedback helps.