Allow to remove an agent as soon as it connected once, but has no more tasks left

anbraten commented 8 months ago

xoxys commented 8 months ago

What exactly is the bug in the current behavior? This will avoid automatically removing faulty agents that can't connect for whatever reason, and I don't really like this new behavior.

anbraten commented 8 months ago

What exactly is the bug in the current behavior? This will avoid automatically removing faulty agents that can't connect for whatever reason, and I don't really like this new behavior.

The idea is to remove an agent if it has no new tasks and:

was connected at least once, so had a chance to get a task
was never connected, but took over x minutes to connect
lost connection somehow for x minutes

xoxys commented 7 months ago

@anbraten Something missing for this PR?

xoxys commented 7 months ago

AgentAllowedStartupTime was intended to be used as Min-Time-To-Live. I would like to be able to set it to e.g. 2h and no matter what happens, the agent cannot be auto-removed (even if it idles/is stale) until it has been in place for more than 2 hours. Is this still the case?

anbraten commented 7 months ago

I am not sure about this min time to live setting. My plan was to have one setting for a timeout the agent is allowed to be provisioned (this one) and connects and a second for the time the agent should be alive after the last task was executed before being removed again.

Why would you like to keep the agent regardless if its idle or not?

xoxys commented 7 months ago

At least I understand now why you renamed the option, even if this was never the intention of my initial implementation :smile:

Why would you like to keep the agent regardless if its idle or not?

I'll try to explain it a little better this time. When I work on my projects, it's pretty common for my workflow to work in blocks. This means that when I start work, I work for 2-3 hours before I have to stop and get back to real life tasks. During this time, I like to have as little waiting time as possible. But since provisioning new agents takes 2-3 minutes, it quickly becomes annoying to have to wait for them. I therefore set WOODPECKER_AGENT_ALLOWED_STARTUP_TIME=2h as a result, I can work during my typical work blocks without having to wait.

and a second for the time the agent should be alive after the last task was executed before being removed again.

However, I think I can use the new option WOODPECKER_AGENT_INACTIVITY_TIMEOUT to get the same result, more or less.

Is something missing before this PR can be merged. The code looks good to me, just added a small code suggestion.

anbraten commented 7 months ago

I'll try to explain it a little better this time. When I work on my projects, it's pretty common for my workflow to work in blocks. This means that when I start work, I work for 2-3 hours before I have to stop and get back to real life tasks. During this time, I like to have as little waiting time as possible. But since provisioning new agents takes 2-3 minutes, it quickly becomes annoying to have to wait for them. I therefore set WOODPECKER_AGENT_ALLOWED_STARTUP_TIME=2h as a result, I can work during my typical work blocks without having to wait.

Nice, that's the same thing I am looking for for my work / the woodpecker project itself. The issue with your suggestion was in my case that it if I was still using the agent after 2 hours and it had a short idle time for 2 minutes it would shutdown the agent immediately and on the other hand if there was just a quick pipeline to be executed the agent would stay for 2 more hours doing nothing.

anbraten commented 7 months ago

But we might need to add sth like a last_workdone timestamp per agent to the server which is updated as soon as the agent is sending some pipeline status updates to fully support the WOODPECKER_AGENT_INACTIVITY_TIMEOUT setting.

xoxys commented 6 months ago

@anbraten Can we merge it for now? Or do you want to add https://github.com/woodpecker-ci/autoscaler/pull/92#issuecomment-2003075595 first?

woodpecker-bot commented 1 month ago

🎉 This PR is included in version 0.3.0 🎉

The release is now available here

Thank you for your contribution. ❤️📦🚀

woodpecker-ci / autoscaler

Allow to remove an agent as soon as it connected once, but has no more tasks left #92