Closed timja closed 8 years ago
tfennelly It looks like HistoryPageFilter converts an iterator of all builds to a list, thereby breaking lazy loading carefully implemented in AbstractLazyLoadRunMap. Or am I misreading this?
This has affected the latest LTS (1.642.1) and rendered jobs with 2000+ builds not responsive
danielbeck I don't think HistoryPageFilter is the issue as it's only short lived object (during rendering) + it only ever holds maxEntries, which is typically just 30.
tfennelly Thanks for the update. Would still appreciate if you could take a closer look some time this month.
Tom FENNELLY, the toList() call mentioned by Daniel Beck is not in HistoryPageFilter, but is in the BuildHistoryWidget. From his stack trace:
hudson.widgets.BuildHistoryWidget.getHistoryPageFilter(BuildHistoryWidget.java:81)
I have a very similar stack trace and had opened a duplicate of this ticket reporting the same issue: https://issues.jenkins-ci.org/browse/JENKINS-31888.
I have a change that updates the logic in BuildHistoryWidget and HistoryPageFilter to avoid serializing the entire list of jobs to a List. Will send a pull request soon after a bit more testing.
Sent a pull request: https://github.com/jenkinsci/jenkins/pull/2456
Has anyone confirmed whether or not the use of the MultiJob plugin plays a role in this performance problem? Based on a somewhat superficial review of the build records for jobs of this type, it would appear as though that plugin creates custom build records which recursively reference children jobs of the parent. I also suspect that the sub-builds are being loaded when querying the build history of the parent as well, meaning there may be dozens of XML file accesses for each build item for jobs of this type.
If my assumptions prove correct, perhaps a related defect should be created to find some way to optimize the MultiJob plugin to avoid these nested queries.
Also, I would assume that the size of the build logs (aka: build.xml) for each build loaded by the HistoryWidget would affect the overall impact of this problem. For example, in our use case I suspect this seemingly unrelated defect combined with jobs of type MultiJob exacerbates this problem even further.
For example, in our use case our build.xml files for some of our jobs are growing to 5-10MB or more, per build. Closer examination of these files reveals that most of this data (~90%) is isolated to the BuildData section described by JERKINS-19022. However, as if that wasn't bad enough, MultiJob jobs apparently have sub-sections for each child job they link to, and each of those children has it's own copy of the Git BuildData as well. For MultiJobs with many sub-jobs the impacts of this problem seem obvious. This behaviour is recursive as well, in cases where multijobs are managed by other multijobs.
I believe these are independent problems that are subtly affecting one another, so the sooner they can be fixed the better.
Code changed in jenkins
User: DJ Lee
Path:
core/src/main/java/hudson/widgets/BuildHistoryWidget.java
core/src/main/java/hudson/widgets/HistoryWidget.java
core/src/main/java/jenkins/widgets/HistoryPageFilter.java
core/src/test/java/jenkins/widgets/HistoryPageFilterTest.java
http://jenkins-ci.org/commit/jenkins/55203ebeed1b7e182878d3e3c1184ac042f20473
Log:
[FIXED JENKINS-31791] Optimize BuildHistoryWidget (#2456)
Refactor HistoryPageFilter to lazily evaluate an iterable of previous
runs, instead of instantiating a super long list of builds.
Instantiating the whole list can be problematic with lots of
historical builds especially if disk IO is slow.
leedega in my company's Jenkins instance, we were having problems like this with MultiJob and without, as long as there were a ton of builds in the history. Our Jenkins master is on an AWS instance with EBS disk, so the I/O is a bit slower than a physical machine, which is probably a contributing factor.
The fix for above will be included in Jenkins 2.17.
olivergondza May I ask why you removed the lts-candidate tag from this issue? My company is still waiting for an LTS build containing this fix. Is there anything I can do to help get this considered or merged to the LTS branch?
djlee, because the fix was naturally absorbed by the next LTS line based on 2.17. RC to be pushed today.
olivergondza aha - thanks for explaining!
[Originally duplicated by: JENKINS-31663]
[Originally duplicated by: JENKINS-34526]
[Originally duplicated by: JENKINS-31888]
Job pages will not load. Get request never fulfilled. Web browser just idles eventually timing out. ThreadDump reveals it's an issue with HistoryWidget (see below). Reverting back to Jenkins 1.620 fixes problem.
THREAD DUMP:
Originally reported by mhulth, imported from: Cannot load job pages due to HistoryWidget hanging