[JENKINS-54564] Jenkins master CPU pegs to >= 18% - Killing telemetry collection thread seems to fix it

timja commented 5 years ago

I noticed my CPU randomly seems to peg to ~18%. This has happened several times without warning.

I used a Jenkins monitoring feature in order to look at all of the threads and I noticed this one thread which seemed to be taking a huge amount of resources and I decided to click on the terminate button on the right

After terminating that thread, the CPU levels returned to their normal levels:

The only clue that I can think of is that I run a script on the master host which uses the jenkins CLI to idempotently ensure certain workers are attached on a 10m interval.. that's why you see a spike every 10m. The CPU problem I am reporting seems to usually kick off at the beginning of those 10m intervals.

Originally reported by piratejohnny, imported from: Jenkins master CPU pegs to >= 18% - Killing telemetry collection thread seems to fix it

status: Open
priority: Minor
resolution: Unresolved
imported: 2022/01/10

timja commented 5 years ago

piratejohnny:

In the thread dump I took prior to killing the thread, here's what I see with respect to telemetry:

"telemetry collection thread" #1313308 daemon prio=5 os_prio=0 tid=0x0000561f7e3f0800 nid=0x2e99 runnable [0x00007ff048996000]"telemetry collection thread" #1313308 daemon prio=5 os_prio=0 tid=0x0000561f7e3f0800 nid=0x2e99 runnable [0x00007ff048996000]   java.lang.Thread.State: RUNNABLE at java.util.TreeMap.successor(TreeMap.java:2154) at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1212) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1265) at java.util.AbstractSet.hashCode(AbstractSet.java:124) at java.util.TreeMap$Entry.hashCode(TreeMap.java:2112) at java.util.AbstractMap.hashCode(AbstractMap.java:530) at java.util.HashMap.hash(HashMap.java:339) at java.util.HashMap.put(HashMap.java:612) at java.util.HashSet.add(HashSet.java:220) at net.sf.json.AbstractJSON.addInstance(AbstractJSON.java:67) at net.sf.json.JSONObject._fromMap(JSONObject.java:1059) at net.sf.json.JSONObject.fromObject(JSONObject.java:160) at net.sf.json.JSONObject.fromObject(JSONObject.java:132) at jenkins.telemetry.impl.StaplerDispatches.createContent(StaplerDispatches.java:83) at jenkins.telemetry.Telemetry$TelemetryReporter.lambda$execute$0(Telemetry.java:167) at jenkins.telemetry.Telemetry$TelemetryReporter$$Lambda$260/1575862926.accept(Unknown Source) at java.lang.Iterable.forEach(Iterable.java:75) at jenkins.telemetry.Telemetry$TelemetryReporter.execute(Telemetry.java:155) at hudson.model.AsyncPeriodicWork$1.run(AsyncPeriodicWork.java:101) at java.lang.Thread.run(Thread.java:748)

timja commented 5 years ago

oleg_nenashev:

CC danielbeck

timja commented 5 years ago

danielbeck:

This is weird. Telemetry should only be collected once per day.

Are you programmatically kicking off all AsyncPeriodicWork extensions?

timja commented 5 years ago

piratejohnny:

danielbeck No I am not doing anything with Telemetry classes or AsyncPeriodicWork. I've written 99.9% of all of the jenkins pipelining and set everything up so I can confidently say that nobody else is either.

I did set something up to poll some of the Jenkins metrics JSON apis so I can have nice dashboards like you see in my screenshots.. that's about it.

timja commented 5 years ago

danielbeck:

Please check the jenkins.log for Starting telemetry collection and Finished telemetry collection messages (or similar). How often do they appear?

timja / jenkins-gh-issues-poc-06-18

[JENKINS-54564] Jenkins master CPU pegs to >= 18% - Killing telemetry collection thread seems to fix it #4105