Closed ppanyukov closed 7 years ago
@ppanyukov might be a dotnet core's problem. There is a issue in donet/corefx repo looks similar, https://github.com/dotnet/corefx/issues/9751
@TingluoHuang is that correct issue? Does not look related.
Looks pretty much spot on. The poll
syscalls in my trace are for fd 127 which is a socket to our company.visualstudio.com
.
@ppanyukov I would like link your issue to donet core guys, I want let them know that the issue is affecting the vsts-agent.
@TingluoHuang definitely. It would be nice to fix. And anything I can help with, repro, debugging etc -- let me know, will be glad to help.
BTW, if you look at the issue now you will it's already linked (unless you meant different kind of link).
@ppanyukov i will leave a message there to let them know. :)
Close since https://github.com/dotnet/corefx/issues/9751 been resolved.
@TingluoHuang Do you know when the fix from corefx will be available in the agent? Unless it's already done?
@ppanyukov it will take a well, we won't consume latest corefx until they have a plan of shipping 1.1.0.
@TingluoHuang I guess you meant 1.2.0, not 1.1.0? If so, I see that it's Due by April 30, 2017
, which is 6 months away, is that correct?
In any case, even though this is external
issue, since the issue with the agent is actually not resolved yet, can we perhaps keep this issue open until such time that the agent ships with the fixed corefx?
reopen for tracking.
2.114.0 agent is build on netcore 1.1 which contains the fix.
Agent version and platform
Version of your agent? 2.104.0
OS of the machine running the agent?
Linux. Two setups:
VSTS type and version
VisualStudio.com
What's not working?
When I run a build, I notice that
Agent.Listener
process is at 100% CPU almost all the time.Very surprising, surely should not happen?
Here are some stats.
The build ran for 35 minutes (in our case this is normal time).
Here you see that
Agent.Listener
consumed 25 minutes of CPU during the build time, implying it was at 100% most of the time (if my interpretation of this is correct).During the build the
top
command showed pretty much this picture:I did an
strace
on it, and it showed things like this:Looks like there are a lot of calls to
clock_gettime
.When I ran
strace
with counters for a few seconds, it showed this:In a few seconds we accumulated
1,219,155
toclock_gettime
and405,812
calls topoll
.I'm not sure about
poll
but so many syscalls toclock_gettime
does not feel right?The above was on
Centos7 container on Ubuntu Xenial host
setup.Running
Centos7 container on Centos7 host
showed similar pattern, but looked less severe --Agent.Listener
consumed 11 minutes of CPU during 30 minutes build.Is there some busy loop somewhere that should not be? Is it a bug in .NET CLR? Is it due to containerisation?
Agent and Worker's diag log
There is nothing in the logs that looks relevant.