[enhancement]: Add idle timeout flag when running with "once" enabled

Nohac commented 6 months ago

Describe your feature request here

I'm managing a kubernetes cluster that uses KEDA to dispatch agents on demand based on the pool queue on DevOps. The agents run with --once to make sure they shut down after each job to allow the cluster to scale down it's nodes when no jobs are running, this works fine most of the time.

The issue arises if, for whatever reason, the new agent did not receive a job (this could happen if someone cancels a job, or something else unexpected happens). This is usually fine in a busy pool, since the agent will receive a job within a short amount of time, however, when this happens at the end of the day, or end of the work week, this can cause unnecessary infrastructure to run over the weekend, which will dramatically increase the cost, especially if the infrastructure includes GPU's or other expensive hardware.

I think this could be easily fixed by adding an "idle timeout" flag to the agent, this flag should allow specifying how long an agent is allowed to run while being idle.

./run-agent.sh --timeout 5m --once

The above command would ensure that the agent would timeout after 5 minutes, unless it received a job within that time frame.

I could work around this issue by using the DevOps api to fetch idle agents and tell kubernetes to stop the pod, but this seems like a lot of work that could be easily avoided with this proposal.

DmitriiBobreshev commented 6 months ago

Hi @Nohac, thank you for the idea. We're working on higher-prioritized issues at the moment, but we'll try to implement it soon as we can.

Nohac commented 6 months ago

I'm willing to try implementing this feature if someone can point me in the right direction.

github-actions[bot] commented 1 week ago

This issue has had no activity in 180 days. Please comment if it is not actually stale

Nohac commented 10 hours ago

It would still be nice to have this feature, please re-open the issue.

microsoft / azure-pipelines-agent

[enhancement]: Add idle timeout flag when running with "once" enabled #4806

Describe your feature request here