When a project is being deleted (with 1.12.28) via API call and there is bunch of suggester rebuilds queued and in progress, resulting in the project delete operation taking very long time to complete:
"OpenGrok-api_task-projects-37725" #37725 prio=5 os_prio=64 cpu=703154.41ms elapsed=3178188.44s tid=0x000000000269b000 nid=0x9857 waiting for monitor entry [0x00007ff312629000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.opengrok.suggest.Suggester.remove(Suggester.java:391)
- waiting to lock <0x00007ff609bd7db8> (a java.lang.Object)
at org.opengrok.web.api.v1.suggester.provider.service.impl.SuggesterServiceImpl.delete(SuggesterServiceImpl.java:233)
at org.opengrok.web.api.v1.controller.ProjectsController.deleteProjectDataWorkHorse(ProjectsController.java:267)
at org.opengrok.web.api.v1.controller.ProjectsController.deleteProjectWorkHorse(ProjectsController.java:197)
at org.opengrok.web.api.v1.controller.ProjectsController.lambda$deleteProject$3(ProjectsController.java:189)
at org.opengrok.web.api.v1.controller.ProjectsController$$Lambda$11076/0x00007feb1383b4b0.call(Unknown Source)
at org.opengrok.web.api.ApiTask.lambda$getCallable$0(ApiTask.java:202)
at org.opengrok.web.api.ApiTask$$Lambda$4215/0x00007ff297154458.call(Unknown Source)
at java.util.concurrent.FutureTask.run(java.base@11.0.7-internal/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7-internal/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7-internal/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.7-internal/Thread.java:834)
the thread holding the lock is probably this one:
"ForkJoinPool-13877-worker-19" #372875 daemon prio=5 os_prio=64 cpu=91891.56ms elapsed=92.65s tid=0x0000000010777000 nid=0x5c1f6 runnable [0x00007ff26eefc000]
java.lang.Thread.State: RUNNABLE
at org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.getFrame(SegmentTermsEnum.java:213)
at org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.pushFrame(SegmentTermsEnum.java:241)
at org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:494)
at org.apache.lucene.index.LeafReader.docFreq(LeafReader.java:81)
at org.apache.lucene.index.BaseCompositeReader.docFreq(BaseCompositeReader.java:196)
at org.opengrok.suggest.SuggesterUtils.computeNormalizedDocumentFrequency(SuggesterUtils.java:108)
at org.opengrok.suggest.SuggesterUtils.computeScore(SuggesterUtils.java:97)
at org.opengrok.suggest.SuggesterProjectData$WFSTInputIterator.weight(SuggesterProjectData.java:625)
at org.apache.lucene.search.suggest.SortedInputIterator.sort(SortedInputIterator.java:190)
at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:80)
at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:64)
at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup$WFSTInputIterator.<init>(WFSTCompletionLookup.java:269)
at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup.build(WFSTCompletionLookup.java:107)
at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:268)
at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:255)
at org.opengrok.suggest.SuggesterProjectData.rebuild(SuggesterProjectData.java:240)
at org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$2(Suggester.java:370)
at org.opengrok.suggest.Suggester$$Lambda$4234/0x00007fff878fc500.run(Unknown Source)
at java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(java.base@11.0.7-internal/ForkJoinTask.java:1407)
at java.util.concurrent.ForkJoinTask.doExec(java.base@11.0.7-internal/ForkJoinTask.java:290)
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@11.0.7-internal/ForkJoinPool.java:1020)
at java.util.concurrent.ForkJoinPool.scan(java.base@11.0.7-internal/ForkJoinPool.java:1656)
at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.7-internal/ForkJoinPool.java:1594)
at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.7-internal/ForkJoinWorkerThread.java:177)
Also, there is bunch of threads waiting with stacks like this:
"ForkJoinPool.commonPool-worker-39" #69874 daemon prio=5 os_prio=64 cpu=364.29ms elapsed=2889951.75s tid=0x000000000a95e800 nid=0x118f4 waiting for monitor entry [0x00007ff27c292000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.opengrok.suggest.Suggester.rebuild(Suggester.java:307)
- waiting to lock <0x00007ff609bd7db8> (a java.lang.Object)
at org.opengrok.web.api.v1.suggester.provider.service.impl.SuggesterServiceImpl.rebuild(SuggesterServiceImpl.java:213)
at org.opengrok.web.api.v1.controller.ProjectsController.lambda$markIndexed$7(ProjectsController.java:370)
at org.opengrok.web.api.v1.controller.ProjectsController$$Lambda$4233/0x00007fff878ff0b0.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.7-internal/CompletableFuture.java:1736)
at java.util.concurrent.CompletableFuture$AsyncRun.exec(java.base@11.0.7-internal/CompletableFuture.java:1728)
at java.util.concurrent.ForkJoinTask.doExec(java.base@11.0.7-internal/ForkJoinTask.java:290)
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@11.0.7-internal/ForkJoinPool.java:1020)
at java.util.concurrent.ForkJoinPool.scan(java.base@11.0.7-internal/ForkJoinPool.java:1656)
at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.7-internal/ForkJoinPool.java:1594)
at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.7-internal/ForkJoinWorkerThread.java:177)
It is not clear to me why the global lock is needed considering the projectData is ConcurrentHashMap. Also, each per project suggester data class SuggesterProjectData has its own lock that is used to enclose the bodies of rebuild() and remove().
Lastly, it seems that the scheduler allows the queued threads to grab the lock in the order they attempted to acquire it, so the ProjectsController thread that came late to the party has to wait for them all to complete, (even though its operation could be quicker than the rebuilds) leading to the observed behavior. In my case there was circa 30 scheduled rebuilds waiting, each takes between 5-8 minutes to complete, so the deletion request has to wait for more than 2 hours (one of the reasons for #4536).
When a project is being deleted (with 1.12.28) via API call and there is bunch of suggester rebuilds queued and in progress, resulting in the project delete operation taking very long time to complete:
the thread holding the lock is probably this one:
Also, there is bunch of threads waiting with stacks like this:
Source code wise the waiting thread is trying to acquire the per-class lock in
Suggester#remove()
: https://github.com/oracle/opengrok/blob/505f4300fc8be9540017e1c71e42daaf475c5f4e/suggester/src/main/java/org/opengrok/suggest/Suggester.java#L390 while it is being held by a thread inSuggester#rebuild()
: https://github.com/oracle/opengrok/blob/505f4300fc8be9540017e1c71e42daaf475c5f4e/suggester/src/main/java/org/opengrok/suggest/Suggester.java#L307It is not clear to me why the global lock is needed considering the
projectData
isConcurrentHashMap
. Also, each per project suggester data classSuggesterProjectData
has its own lock that is used to enclose the bodies ofrebuild()
andremove()
.Lastly, it seems that the scheduler allows the queued threads to grab the lock in the order they attempted to acquire it, so the
ProjectsController
thread that came late to the party has to wait for them all to complete, (even though its operation could be quicker than the rebuilds) leading to the observed behavior. In my case there was circa 30 scheduled rebuilds waiting, each takes between 5-8 minutes to complete, so the deletion request has to wait for more than 2 hours (one of the reasons for #4536).