Closed vladak closed 4 months ago
https://github.com/oracle/opengrok/wiki/Suggester will have to be updated with the new tunable.
Tested by running per-project indexing workflow (via opengrok-sync
), checked there are some rebuild threads running, waited for some of them to complete and then requested rebuild of each project separately:
curl -s http://localhost:8080/source/api/v1/projects | jq '.[]' | sed 's/"//g' | while read proj; do curl -X PUT http://localhost:8080/source/api/v1/suggest/rebuild/$proj; done
and again rechecked the running threads using prstat -L -m
and jstack
. Here's the prstat
output (the thread names are truncated so incomplete thread id is displayed):
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG LWPID PROCESS/LWPNAME
21563 webservd 94.90 0.318 0.014 0.000 1.680 3.026 0.000 0.063 104 33 470 22 12830 java/OpenGrok-suggester-rebuild-1208
21563 webservd 93.00 0.333 0.015 0.000 3.548 3.043 0.000 0.064 95 33 491 21 12847 java/OpenGrok-suggester-rebuild-1209
21563 webservd 90.55 0.656 0.015 0.000 5.665 3.036 0.000 0.075 119 31 682 23 12829 java/OpenGrok-suggester-rebuild-1208
21563 webservd 88.81 0.429 0.016 0.000 7.613 3.046 0.000 0.086 109 32 148 24 12823 java/OpenGrok-suggester-rebuild-1207
21563 webservd 86.84 1.258 0.027 0.000 8.732 3.041 0.000 0.098 151 32 1815 22 12838 java/OpenGrok-suggester-rebuild-1209
21563 webservd 37.83 1.321 0.007 0.000 7.142 52.27 1.334 0.091 150 18 1137 18 12831 java/OpenGrok-suggester-rebuild-1208
21563 webservd 36.11 2.198 0.008 0.000 7.996 52.86 0.636 0.196 384 14 1556 16 12817 java/OpenGrok-suggester-rebuild-1207
21563 webservd 36.96 1.334 0.007 0.000 6.035 53.13 2.170 0.364 685 17 1294 9 12841 java/OpenGrok-suggester-rebuild-1209
21563 webservd 34.63 3.024 0.011 0.000 9.124 52.34 0.736 0.139 229 16 2394 15 12822 java/OpenGrok-suggester-rebuild-1207
21563 webservd 34.09 2.371 0.007 0.000 7.062 53.55 2.530 0.383 703 15 1811 9 12828 java/OpenGrok-suggester-rebuild-1208
21563 webservd 32.05 3.065 0.011 0.000 6.470 53.75 4.279 0.384 687 17 2120 10 12842 java/OpenGrok-suggester-rebuild-1209
21563 webservd 32.46 2.555 0.011 0.000 9.073 55.20 0.528 0.169 257 16 1633 13 12832 java/OpenGrok-suggester-rebuild-1208
21563 webservd 30.42 3.487 0.007 0.000 8.334 54.03 3.434 0.292 524 14 2143 7 12825 java/OpenGrok-suggester-rebuild-1207
21563 webservd 30.98 2.364 0.007 0.000 8.261 55.35 2.704 0.327 579 17 1593 6 12852 java/OpenGrok-suggester-rebuild-1209
21563 webservd 30.78 2.364 0.008 0.000 8.191 55.34 2.999 0.325 576 17 1537 6 12835 java/OpenGrok-suggester-rebuild-1208
21563 webservd 27.06 4.833 0.007 0.000 8.399 54.94 4.401 0.364 622 16 2649 6 12833 java/OpenGrok-suggester-rebuild-1208
21563 webservd 26.04 4.663 0.008 0.000 8.196 55.90 4.860 0.332 612 14 2566 9 12844 java/OpenGrok-suggester-rebuild-1209
21563 webservd 21.91 1.479 0.006 0.000 6.158 70.33 0.059 0.053 96 9 1315 4 12827 java/OpenGrok-suggester-rebuild-1208
21563 webservd 7.490 1.846 0.003 0.000 0.981 88.49 1.028 0.158 293 4 1333 8 12845 java/OpenGrok-suggester-rebuild-1209
...
As for the jstack
output, it is important that there multiple rebuild threads in RUNNABLE
state with stacks like this:
"OpenGrok-suggester-rebuild-12077" #12077 prio=5 os_prio=64 cpu=963690.72ms elapsed=1704.64s tid=0x0000000007bf8000 nid=0x3217 runnable [0x00007fff535fd000]
java.lang.Thread.State: RUNNABLE
at java.nio.ByteBuffer.get(java.base@11.0.7-internal/ByteBuffer.java:737)
at java.nio.DirectByteBuffer.get(java.base@11.0.7-internal/DirectByteBuffer.java:319)
at org.apache.lucene.store.ByteBufferGuard.getBytes(ByteBufferGuard.java:94)
at org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:138)
at org.apache.lucene.util.compress.LZ4.decompress(LZ4.java:106)
at org.apache.lucene.codecs.lucene90.blocktree.CompressionAlgorithm$3.read(CompressionAlgorithm.java:45)
at org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:182)
at org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:539)
at org.apache.lucene.index.LeafReader.docFreq(LeafReader.java:81)
at org.apache.lucene.index.BaseCompositeReader.docFreq(BaseCompositeReader.java:197)
at org.opengrok.suggest.SuggesterUtils.computeNormalizedDocumentFrequency(SuggesterUtils.java:108)
at org.opengrok.suggest.SuggesterUtils.computeScore(SuggesterUtils.java:97)
at org.opengrok.suggest.SuggesterProjectData$WFSTInputIterator.weight(SuggesterProjectData.java:638)
at org.apache.lucene.search.suggest.SortedInputIterator.sort(SortedInputIterator.java:190)
at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:80)
at org.apache.lucene.search.suggest.SortedInputIterator.<init>(SortedInputIterator.java:64)
at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup$WFSTInputIterator.<init>(WFSTCompletionLookup.java:272)
at org.apache.lucene.search.suggest.fst.WFSTCompletionLookup.build(WFSTCompletionLookup.java:109)
at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:281)
at org.opengrok.suggest.SuggesterProjectData.build(SuggesterProjectData.java:268)
at org.opengrok.suggest.SuggesterProjectData.rebuild(SuggesterProjectData.java:244)
at org.opengrok.suggest.Suggester.lambda$getRebuildRunnable$5(Suggester.java:428)
at org.opengrok.suggest.Suggester$$Lambda$4928/0x00000008015da440.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.7-internal/Executors.java:515)
at java.util.concurrent.FutureTask.run(java.base@11.0.7-internal/FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7-internal/ThreadPoolExecutor.java:1128)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7-internal/ThreadPoolExecutor.java:628)
at java.lang.Thread.run(java.base@11.0.7-internal/Thread.java:834)
The projects for which the rebuild is already running will actually be serialized on the write lock in SuggesterProjectData#isInitialized()
(newly added method):
"ForkJoinPool.commonPool-worker-53" #12424 daemon prio=5 os_prio=64 cpu=0.53ms elapsed=47.20s tid=0x0000000003770000 nid=0x33c0 waiting on condition [0x00007fff3cafd000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.7-internal/Native Method)
- parking to wait for <0x0000000083f012a0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(java.base@11.0.7-internal/LockSupport.java:194)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.7-internal/AbstractQueuedSynchronizer.java:885)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11.0.7-internal/AbstractQueuedSynchronizer.java:917)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11.0.7-internal/AbstractQueuedSynchronizer.java:1240)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(java.base@11.0.7-internal/ReentrantReadWriteLock.java:959)
at org.opengrok.suggest.SuggesterProjectData.isInitialized(SuggesterProjectData.java:257)
at org.opengrok.suggest.Suggester.rebuild(Suggester.java:369)
at org.opengrok.web.api.v1.suggester.provider.service.impl.SuggesterServiceImpl.rebuild(SuggesterServiceImpl.java:214)
at org.opengrok.web.api.v1.controller.SuggesterController.lambda$rebuild$1(SuggesterController.java:185)
at org.opengrok.web.api.v1.controller.SuggesterController$$Lambda$4951/0x00000008015d0840.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.7-internal/CompletableFuture.java:1736)
at java.util.concurrent.CompletableFuture$AsyncRun.exec(java.base@11.0.7-internal/CompletableFuture.java:1728)
at java.util.concurrent.ForkJoinTask.doExec(java.base@11.0.7-internal/ForkJoinTask.java:290)
at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(java.base@11.0.7-internal/ForkJoinPool.java:1020)
at java.util.concurrent.ForkJoinPool.scan(java.base@11.0.7-internal/ForkJoinPool.java:1656)
at java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.7-internal/ForkJoinPool.java:1594)
at java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.7-internal/ForkJoinWorkerThread.java:177)
This change addresses 2 problems with suggester:
Suggester
class for init/rebuild/remove operations prevented API calls to proceed for a long time. Together withSuggesterServiceImpl
andSuggesterProjectData
, there were 3 layers of locking, making this confusing.The locking part of the change is done with the assumption that the
lock
object based synchronization in theSuggester
class was primarily used to protect the insertions/queries to theprojectData
map. I addressed this usingcomputeIfAbsent
which is atomic for theConcurrentHashMap
used.The rebuild execution part of the change was done by introducing a thread pool that is reused for the init/rebuild operations.
In general this should improve the throughput of the suggester.
While there, I added tunable for the search pool parallelism level and addressed a few nits in the code.
Also, changed the latency meters to record per suggester data times rather than aggregates as the latter makes limited sense, I think.