timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-12396] Jobs remain on executor for 20 minutes after they finish #7062

Open timja opened 12 years ago

timja commented 12 years ago

I recently upgraded Jenkins to 1.447 and enabled the hudson.upstreamCulprits setting. When my Maven2/3 job runs, it completes normally from the console perspective, but on the main build page, the executors show that it is still running. I used the monitoring plugin to inspect the threads and found that it seems to be trying to figure out which logs to keep. This process seems to run for about 20 minutes. I am guessing this may have something to do with the number of builds. Here's a sample stack trace of the executor:

Executor #0 for master : executing aat-tao-3.8 #805
java.util.ArrayList.size(ArrayList.java:177)
java.util.AbstractList$Itr.hasNext(AbstractList.java:339)
hudson.maven.MavenModuleSetBuild.getDownstreamRelationship(MavenModuleSetBuild.java:489)
hudson.model.AbstractBuild.getWhyKeepLog(AbstractBuild.java:986)
hudson.maven.MavenBuild.getWhyKeepLog(MavenBuild.java:264)
hudson.model.Run.isKeepLog(Run.java:470)
hudson.tasks.LogRotator.perform(LogRotator.java:140)
hudson.model.Job.logRotate(Job.java:326)
hudson.maven.MavenModuleSet.logRotate(MavenModuleSet.java:712)
hudson.model.Run.run(Run.java:1469)
hudson.maven.MavenModuleSetBuild.run(MavenModuleSetBuild.java:481)
hudson.model.ResourceController.execute(ResourceController.java:88)
hudson.model.Executor.run(Executor.java:238)


Originally reported by petehayes, imported from: Jobs remain on executor for 20 minutes after they finish
  • status: Reopened
  • priority: Critical
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 10 years ago

danielbeck:

After a build is finished, Jenkins runs the log rotator. It needs to determine which build logs to keep. What exactly is the bug?

timja commented 10 years ago

petehayes:

I think the performance of this activity is too slow which may be caused by some unnecessary activity in this logic - I'm not sure.

timja commented 10 years ago

danielbeck:

How many builds do you have? How many downstream projects are there? Did you select to keep upstream builds around in the downstream builds' Fingerprint post-build step?

timja commented 10 years ago

danielbeck:

No response in two weeks to question asking for additional information to aid investigation, so closing as Incomplete.

timja commented 9 years ago

marksyms:

I don't know if it's because our Jenkins Master is Windows based and NTFS is not good at handling deletes, or that we have a lot of jobs or that our jobs are quite large (many hundreds of files) or some combination of these but this stage can regularly take in excess of an hour to complete after the job has reported that it has finished.

It feels like it would be nicer if the build discard process occurred as a background housekeeping operation rather than as an inline part of the latest build, possibly also only having a single cleanup process rather than every job being able to trigger these tasks concurrently might be a good idea.

timja commented 9 years ago

marksyms:

As this was resolved as "incomplete" and I have added additional context to the commentary I believe it should be looked at again.

timja commented 9 years ago

danielbeck:

I don't know if it's because our Jenkins Master is Windows based and NTFS is not good at handling deletes, or that we have a lot of jobs or that our jobs are quite large (many hundreds of files) or some combination of these but this stage can regularly take in excess of an hour to complete after the job has reported that it has finished.

The stack trace shows Jenkins is busy with determining what builds may be deleted, not the actual deleting itself.

It feels like it would be nicer if the build discard process occurred as a background housekeeping operation rather than as an inline part of the latest build, possibly also only having a single cleanup process rather than every job being able to trigger these tasks concurrently might be a good idea.

Needs stronger justification than "it doesn't work on my instance".

Get a few more thread dumps for this thread to determine where it's spending the most time. Could well be too little RAM on the Jenkins master to keep builds in memory, resulting in repeated build loading and discarding from memory.

timja commented 9 years ago

marksyms:

In my particular case I can see the file deletes (on the master) happening as being the thing that takes the time, as I say this may be down to NTFS and Windows. I'll get some stack traces when I see it happening.

timja commented 9 years ago

marksyms:

Actually, how do I extract a stack trace from the Java processes on the Master and Build slaves when they're running as Windows Services?

Thanks

timja commented 9 years ago

marksyms:

This screen shot shows the delete operations occurring in the Master, whilst this is happening the latest build of the associated build on the build slave is spinning after the completion line.

timja commented 9 years ago

danielbeck:

This screen shot shows the delete operations occurring in the Master, whilst this is happening the latest build of the associated build on the build slave is spinning after the completion line.

I still don't see a bug here. After a build is finished, the same thread performs the log rotation. The build log file is not yet closed, and there's a progress indicator. If there are many files to delete, it will take some time to finish. If Jenkins needs to traverse a complex build graph to determine what can be deleted, that will also take time. This just is how it works, and not a bug. At this point in time, the build is considered finished, and the build result is final.

Or am I missing something here?