Closed akozhuharov closed 3 months ago
Hi @akozhuharov,
I was not able to reproduce this behavior. With no registered agents, runs transition to the plan_queued
state.
Are there any other conditions for a run stuck in the queuing
state?
Thanks.
We had autoscaling set to 2-14 runners:
agentTokens:
- name: agent-pool-infra-token
autoscaling:
cooldownPeriodSeconds: 30
maxReplicas: 14
minReplicas: 2
name: agent-pool-infra
We had 50 plans queued and the agentpool wasn't scaling beyond 2. Edit: We just upgraded to 1.5.0 and we will see how the scaling works with the syncPeriod on the agent pool.
Hi @akozhuharov,
It looks like, in your case, you have a large number of workspaces attached to the agent pool. Due to this, effective reconciliation occurred every 15-20 minutes instead of the default 30 seconds. We made some changes in 2.5.0 that should address this issue.
We are looking forward to hearing your feedback on whether version 2.5.0 addressed the issue you faced.
Thanks!
Hi @akozhuharov,
I will go ahead and close this PR. Please, feel free to open an issue if you encounter this problem after upgrading to 2.5.0.
Thanks!
We haven't seen the same problem since then, thanks @arybolovlev
Description
We encountered a bug where the autoscaling agentpool controller doesn't take into account generally queued items in TFC and hence the replicas stay at the minimum number set in the agent pool. Steps to reproduce:
Usage Example
I have attached a script using parts of the code in the controller which can highlight the difference(line 32 can be added/removed). main.go.zip
References
Community Note