Limit the number of nodes provisioned

meebok commented 2 years ago

Implement #249 This feature will allow the user to limit the maximum number of nodes that can be provisioned by ECS. This has given us the freedom to adjust the number of nodes based on our needs and ultimately save money. The user can set this number in the configure clouds -> advanced -> maximum nodes. Screen Shot 2021-12-13 at 6 02 03 PM If the user doesn't provide this number, it will be defaulted to DescriptorImpl.DEFAULT_MAXIMUM_NODES. which is 5. Moreover, if the user doesn't want to use this feature, they can enter 0. meaning there wouldn't be a limit to the number of nodes provisioned.

[x] Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
[x] Ensure that the pull request title represents the desired changelog entry
[x] Please describe what you did
[ ] Link to relevant issues in GitHub or Jira
[ ] Link to relevant pull requests, esp. upstream and downstream changes
[x] Ensure you have provided tests - that demonstrates feature works or fixes the issue

wuillaum commented 2 years ago

This change needs to be reverted, or at the very least the default needs to be unlimited (as was the default behavior of the plugin before this PR).

Limiting the number of nodes will not save you money in the vast majority of situations. If you're provisioning a node, your EC2 instance already exists and is actively costing you money. Limiting the number of nodes will more likely cost you money as you will have EC2 instances sitting around doing nothing as your Jenkins throttles on an arbitrary limit. If you're using ECS as it was intended, you are scaling the underlying EC2 cluster in or out depending on whatever CPU/RAM/Jenkins Queue metric you have configured. That is where you need to make tuning adjustments in order to save money.

In the case of Fargate, I still don't think you will save money... eventually whatever is in your Jenkins queue will need to be provisioned, so there is no cost savings having all of your jobs provisioned at once or sequentially. All you're doing is slowing things down.

The only scenario I could think of where this would save you money is if you were using AWS Outposts, where your "node limit" is set to match your Outpost capacity so that you're not expanding into AWS's servers.

Regardless, a limit of 5 doesn't make sense. Most users of this plugin expect infinite scalability.

Stericson commented 2 years ago

Hey @wuillaum, I'm happy to update this to make the default unlimited and to also replace Node with Agent. I would add, however, that Jenkins does use both Agent and Node to describe external build agents, at least in newer versions of Jenkins.

This does help with cost savings however, especially with regards to using Fargate instances. There are at least two scenarios that I can think of where it helps manage costs.

When running Jenkins in an ephemeral setup where everything is managed as code, including the jobs, it's common for a seed job to recreate all jobs from configuration. When this happens, the newly created jobs/pipelines tend to automatically kick off upon discovery. When this happens, it can spin up a significant number of agents to handle the load. Being able to limit the number of agents being created helps manage this.
During a normal workday there can be bursts of build activity. During these bursts you can see a dramatic increase in the number of running agents, again increasing cost during this period. Limiting the number of agents allows for queuing operations rather that spinning up a large number of agents to execute every job immediately.

Jenkins, by default, has the ability to manage the number of executors on the Primary instance. However, until now, there was no way to limit the number of executors/agents when it came to clouds/nodes.

wuillaum commented 2 years ago

During a normal workday there can be bursts of build activity. During these bursts you can see a dramatic increase in the number of running agents, again increasing cost during this period

That's one of the core reasons I use this plugin! It does a terrific job at scaling to thousands of jobs running on 100+ EC2 instances in under a minute :) devs can stay happy with CI/CD builds and tests running super fast

Limiting the number of agents allows for queuing operations rather that spinning up a large number of agents to execute every job immediately.

I fail to see how you would save money here. Whether they execute all at once or queue up sequentially the cost should be the same at the end of the day. You're paying by cpu+memory * time.

Anyway... I'm not opposed to the feature as a whole; I'm sure there there are other use cases where you would want to throttle the instance count of a template (maybe you have a container that shouldn't ever be spun up more than 1 at a time). I would just like to see the default set back to unlimited.

Stericson commented 2 years ago

Well, they don't queue sequentially. If I have 10 builds in the queue and limit to one agent those 10 builds will wait in line for their turn with the build agent. So, it's not eventually provisioning 10 builds, it will limit it to 1 build agent and they will all use that agent as it becomes available. The Agents don't immediately die or shutdown once they've completed a requested build.

Stericson commented 2 years ago

@wuillaum New PR #250 for your review 👍

wuillaum commented 2 years ago

those 10 builds will wait in line for their turn with the build agent

How is that not a sequential queue :sweat_smile:

I guess if your container startup time is costly this would make more sense.

Thank you for the speedy responses and for the PR!

Stericson commented 2 years ago

Thank you for the speedy responses and for the PR!

Of course, thanks for the feedback 👍

How is that not a sequential queue 😅

Well, the build queue for sure. Sorry, I just meant to convey that we aren't provisioning additional build agents/nodes in a sequential manner. We just re-use that one agent in this example as opposed to launching 10 Fargate agents to run the 10 builds. To your point, for EC2 based agents/nodes this likely has a bit more limited value unless you have a direct integration with your ASG that's increasing/decreasing the number of instances. But, for Fargate I think there's more value to put a limit to the number of Fargate instances spun up.

jenkinsci / amazon-ecs-plugin

Limit the number of nodes provisioned #248