Closed grundic closed 7 years ago
Hello! Is there any progress on this issue? :anguished:
Good day, @Funbit!
Sorry for low activity, have some things happening in my life :)
Would like to know your opinion on the implementation details. On SO post you mentioned "random" or "round-robin" priority type
. In my eyes these two are somewhat different:
Random is quite easy to implement, I will just use built in random functionality. For RR it's going to be like this:
Please let me know if this sounds reasonable or you have some other ideas.
Hi! Glad that you're okay :) Actually, I mentioned random priority type just as a simplest possible solution that won't take much time for you to implement :) (though, maybe someone will find it useful too). But round-robin would be the best to have! As for the implementation, your suggestion sounds good. The distribution should be even enough. Looking forward to the update!
Yep, random priority is super easy to implement indeed... RR is tougher but still possible :)
Can you please share your use-case for such agent priority?
The use case is simple, we're using AWS t2.* instance types for TC agents. Such instances have a given CPU quota (i.e. if an instance consumes 100% CPU for a long time it will be slowed down by AWS). Since we have 3 agents and some of our project tests consumes 100% CPU for 10-20 minutes, it's always good to distribute the load among all instances, so some of them can rest. Also, we have pretty complex agent environments and rotating agents would help to reveal environment problems (i.e. difference in agent settings, etc). And by the way, in spite of the fact that all agents have the same hardware, the TeamCity's built-in priority calculator almost always selects the same instance over and over again, forcing us to rotate the agent manually :(
Well, in this case round-robin would not work as expected for your case. Let's say you have 5 builds: Build-A
, Build-B
, Build-C
, Build-D
, Build-F
, and 3 build agents: agent-X
, agent-Y
and agent-Z
.
Triggering sequentially all builds using proposed algorithm would simply use the same agent:
Build-A -> agent-X
Build-B -> agent-X
Build-C -> agent-X
Build-D -> agent-X
Build-F -> agent-X
Because RR is going to explore only current build's history, ignoring others. So running Build-B
after Build-A
could use again agent-X
, as history for Build-B
is empty.
Originally described RR could be also useful, for example in order to support working copy on all agents up to date.
Your case is slightly different, as we have to equally distribute load by agents, not by builds. To do that, we have to get all available agents, get history for each one and calculate priority depending on this information.
But here is corner case with two agents: agent-X
was busy last 30 minutes, then agent-Y
was triggered with some light build, which took 1 minute and produced almost no load. Using dumb solution, we have to choose agent-X
again for the next build, as it was used before agent-Y
. Do you think this would be correct behaviour or it should be done differently?
Well, maybe I was too rushed indeed :) However, the proposed RR algorithm should satisfy our case (i.e. do RR based on projects), since in our environment we have a single large project that takes 99% of time/cpu and several small ones, used only for deployment-like actions which don't consume time/cpu much. Thus, having evenly distributed agent load for each project should work fine. If you think that this topic should be thought over more deeply, we can always start with the random priority calculator :)
@Funbit, I've released v.1.0.2 plugin version with random agent distibution. Please, have a look and tell me how it works. I'm still thinking about how to distribute load between agent more wisely.
So, if we have a long running Fat-Build
and instant Light-Build
, than naive approach with rotating build agents won't be enough:
Fat-Build |-----[Agent-1]-----|
Light-Build |-[Agent-2]-|
So we would have running history like this: Agent-1, Agent-2
. Which means that Agent-1
should be picked next. But strictly speaking, it was loaded more heavily, so it makes sense to put load to Agent-2
.
This example shows, that we have to take into account not only the order, by which agents were picked up for running builds, but also the time, which they were busy.
@Funbit, does this example is close to your situation? Or maybe you have some other criteria to take into account?
Thank you for the update! I've installed 1.0.2 version, will try to see how it works (tried 2 times, got 2 different agents :) So I suppose it works fine!). The example you described is very close to our environment. If it's possible to get "Duration" from the build history then, I think, it would become the best metric to base the priority on. The only question is how many build to get from the history to calculate the statistics. In our environment, for example, we usually don't have more than 1000 entries (old ones are deleted automatically due to large artifact size). So I think it would be fair enough to get ~1000 (the more the better, if available of course) last entries, sum Duration field grouped by agents and then sort the number to get the next agent. Or, maybe have an option to set maximum entries to take from the history... What do you think?
@grundic, loading history can dramatically decrease performance. I think it would be better to use agent's "idle" time. So, less loaded agent will be used all-the-time.
@Jimilian, that's a brilliant idea! Indeed, relying on build history is a big performance hit, though ByBuildStatus priority is already using it (though it lets the user to limit the history exploration).
There is a [getIdleTime](http://javadoc.jetbrains.net/teamcity/openapi/9.1/jetbrains/buildServer/serverSide/SBuildAgent.html#getIdleTime()) method for the SBuildAgent, which returns exactly what you described. I've made some tests and here how it works: this method returns the idle time of an agent since the last finished build. So it doesn't take into account any intermediate outages and resets after any build by the agent is finished. So, for the situation described here, Teamcity will trigger a build on Agent-1, even the overall idle time for it is less then for Agent-2.
@Funbit, can you please tell whether this solution would work for you? I have very representative example of the agent distribution for a single build:
As you can see, agents always kept in order of Default Agent | aaa111 | zzz999
Hmm, what if I register a new agent after using existing ones for some time? Wouldn't it affect the priority distribution? (new agent would have to be prioritized until its idle time becomes greater than existing agents? Or am I wrong?)
From Java documentation:
Returns number of milliseconds build agent was idle. If there was no build started time since registration is returned. If agent is running a build, 0 is returned.
-Edit- New build agent would have idle time since it is being registered.
@Funbit, I've released a new plugin version with idle time priority, you can give it a try.
Thank you! Will try the new idle based priority within this week and get back with the results.
After a couple of days of using the idle based priority I can say that it works very well!
The order is correct even if some builds were running simultanteously.
Thank you again for your time and efforts!
Hey @Funbit,
That's great news! Glad it worked well for you. Thank you for interesting proposal for new priority and additional thanks to @Jimilian for using idle time.
From the comment on the SO: