Add round-robin priority type

grundic commented 7 years ago

From the comment on the SO:

It would be great to have a "random" or "round-robin" priority type there... Because, AFAIU, it's impossible to have such behavior in the current version :(

Funbit commented 7 years ago

Hello! Is there any progress on this issue? :anguished:

grundic commented 7 years ago

Good day, @Funbit!

Sorry for low activity, have some things happening in my life :)

Would like to know your opinion on the implementation details. On SO post you mentioned "random" or "round-robin" priority type. In my eyes these two are somewhat different:

Random: this should provide agents in complete random order, like rolling the dice. Can happen that the same agent would be used twice in a row or be shuffled somehow.
Round-robin: in this case all agents would be used one after another, keeping original order if possible.

Random is quite easy to implement, I will just use built in random functionality. For RR it's going to be like this:

Get available agents for a given build
Load history of size equal to the total available agents
Sort agents taking into account previous order of execution

Please let me know if this sounds reasonable or you have some other ideas.

Funbit commented 7 years ago

Hi! Glad that you're okay :) Actually, I mentioned random priority type just as a simplest possible solution that won't take much time for you to implement :) (though, maybe someone will find it useful too). But round-robin would be the best to have! As for the implementation, your suggestion sounds good. The distribution should be even enough. Looking forward to the update!

grundic commented 7 years ago

Yep, random priority is super easy to implement indeed... RR is tougher but still possible :)

Can you please share your use-case for such agent priority?

Funbit commented 7 years ago

The use case is simple, we're using AWS t2.* instance types for TC agents. Such instances have a given CPU quota (i.e. if an instance consumes 100% CPU for a long time it will be slowed down by AWS). Since we have 3 agents and some of our project tests consumes 100% CPU for 10-20 minutes, it's always good to distribute the load among all instances, so some of them can rest. Also, we have pretty complex agent environments and rotating agents would help to reveal environment problems (i.e. difference in agent settings, etc). And by the way, in spite of the fact that all agents have the same hardware, the TeamCity's built-in priority calculator almost always selects the same instance over and over again, forcing us to rotate the agent manually :(

grundic commented 7 years ago

Well, in this case round-robin would not work as expected for your case. Let's say you have 5 builds: Build-A, Build-B, Build-C, Build-D, Build-F, and 3 build agents: agent-X, agent-Y and agent-Z.

Triggering sequentially all builds using proposed algorithm would simply use the same agent:

Build-A -> agent-X
Build-B -> agent-X
Build-C -> agent-X
Build-D -> agent-X
Build-F -> agent-X

Because RR is going to explore only current build's history, ignoring others. So running Build-B after Build-A could use again agent-X, as history for Build-B is empty.

Originally described RR could be also useful, for example in order to support working copy on all agents up to date.

Your case is slightly different, as we have to equally distribute load by agents, not by builds. To do that, we have to get all available agents, get history for each one and calculate priority depending on this information.

But here is corner case with two agents: agent-X was busy last 30 minutes, then agent-Y was triggered with some light build, which took 1 minute and produced almost no load. Using dumb solution, we have to choose agent-X again for the next build, as it was used before agent-Y. Do you think this would be correct behaviour or it should be done differently?

Funbit commented 7 years ago

Well, maybe I was too rushed indeed :) However, the proposed RR algorithm should satisfy our case (i.e. do RR based on projects), since in our environment we have a single large project that takes 99% of time/cpu and several small ones, used only for deployment-like actions which don't consume time/cpu much. Thus, having evenly distributed agent load for each project should work fine. If you think that this topic should be thought over more deeply, we can always start with the random priority calculator :)

grundic commented 7 years ago

@Funbit, I've released v.1.0.2 plugin version with random agent distibution. Please, have a look and tell me how it works. I'm still thinking about how to distribute load between agent more wisely.

grundic commented 7 years ago

So, if we have a long running Fat-Build and instant Light-Build, than naive approach with rotating build agents won't be enough:


Fat-Build     |-----[Agent-1]-----|
Light-Build                     |-[Agent-2]-|

So we would have running history like this: Agent-1, Agent-2. Which means that Agent-1 should be picked next. But strictly speaking, it was loaded more heavily, so it makes sense to put load to Agent-2.

This example shows, that we have to take into account not only the order, by which agents were picked up for running builds, but also the time, which they were busy.

@Funbit, does this example is close to your situation? Or maybe you have some other criteria to take into account?

Funbit commented 7 years ago

Thank you for the update! I've installed 1.0.2 version, will try to see how it works (tried 2 times, got 2 different agents :) So I suppose it works fine!). The example you described is very close to our environment. If it's possible to get "Duration" from the build history then, I think, it would become the best metric to base the priority on. The only question is how many build to get from the history to calculate the statistics. In our environment, for example, we usually don't have more than 1000 entries (old ones are deleted automatically due to large artifact size). So I think it would be fair enough to get ~1000 (the more the better, if available of course) last entries, sum Duration field grouped by agents and then sort the number to get the next agent. Or, maybe have an option to set maximum entries to take from the history... What do you think?

Jimilian commented 7 years ago

@grundic, loading history can dramatically decrease performance. I think it would be better to use agent's "idle" time. So, less loaded agent will be used all-the-time.

grundic commented 7 years ago

@Jimilian, that's a brilliant idea! Indeed, relying on build history is a big performance hit, though ByBuildStatus priority is already using it (though it lets the user to limit the history exploration).

There is a [getIdleTime](http://javadoc.jetbrains.net/teamcity/openapi/9.1/jetbrains/buildServer/serverSide/SBuildAgent.html#getIdleTime()) method for the SBuildAgent, which returns exactly what you described. I've made some tests and here how it works: this method returns the idle time of an agent since the last finished build. So it doesn't take into account any intermediate outages and resets after any build by the agent is finished. So, for the situation described here, Teamcity will trigger a build on Agent-1, even the overall idle time for it is less then for Agent-2.

@Funbit, can you please tell whether this solution would work for you? I have very representative example of the agent distribution for a single build:

screen shot 2017-07-17 at 17 16 03

As you can see, agents always kept in order of Default Agent | aaa111 | zzz999

Funbit commented 7 years ago

Hmm, what if I register a new agent after using existing ones for some time? Wouldn't it affect the priority distribution? (new agent would have to be prioritized until its idle time becomes greater than existing agents? Or am I wrong?)

grundic commented 7 years ago

From Java documentation:

Returns number of milliseconds build agent was idle. If there was no build started time since registration is returned. If agent is running a build, 0 is returned.

-Edit- New build agent would have idle time since it is being registered.

grundic commented 7 years ago

@Funbit, I've released a new plugin version with idle time priority, you can give it a try.

Funbit commented 7 years ago

Thank you! Will try the new idle based priority within this week and get back with the results.

Funbit commented 7 years ago

After a couple of days of using the idle based priority I can say that it works very well!

The order is correct even if some builds were running simultanteously.

Thank you again for your time and efforts!

grundic commented 7 years ago

Hey @Funbit,

That's great news! Glad it worked well for you. Thank you for interesting proposal for new priority and additional thanks to @Jimilian for using idle time.

grundic / teamcity-agent-priority

Add round-robin priority type #3