Open rmuir opened 3 years ago
linux perf usage becomes a lot easier with jdk-16 early access build. We don't need to compile/install any java agent and can just pass JVM flags such as -XX:+UnlockDiagnosticVMOptions -XX:+DumpPerfMapAtExit -XX:+PreserveFramePointer
, and run the benchmark with perf record
. See https://bugs.openjdk.java.net/browse/JDK-8254723
And maybe, just maybe, we have some giant frogs boiling for years that nobody never noticed yet
LOL
I chatted w/ @rmuir and he suggested we start simpler here with Java Flight Recorder (JFR). We can later upgrade to perf
. Progress not perfection!
Some notes from our chat:
-Dtests.profile
: https://github.com/apache/lucene-solr/blob/master/gradle/testing/profiling.gradle-Dtests.profile
: https://github.com/apache/lucene-solr/blob/master/gradle/testing/profiling.jfcI have a start at this! Still iterating on it ... it produces output like this, after all 20 JVM iterations of a benchmark run, when running on wikimediumall
:
CPU merged search profile for comp:
PROFILE SUMMARY from 1060297 events (total: 1M)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
8.32% 88215 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
8.03% 85123 org.apache.lucene.util.packed.DirectMonotonicReader#get()
6.41% 67966 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
6.14% 65140 org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
5.21% 55271 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
2.88% 30504 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
2.86% 30302 java.nio.DirectByteBuffer#get()
2.72% 28800 jdk.internal.misc.Unsafe#convEndian()
2.14% 22713 org.apache.lucene.store.ByteBufferGuard#ensureValid()
2.07% 21976 java.nio.Buffer#checkIndex()
1.81% 19224 java.nio.ByteBuffer#get()
1.56% 16575 java.nio.Buffer#position()
1.45% 15367 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
1.20% 12770 java.nio.Buffer#checkSegment()
1.20% 12756 org.apache.lucene.store.ByteBufferGuard#getBytes()
1.12% 11868 org.apache.lucene.store.ByteBufferIndexInput#setCurBuf()
1.02% 10787 java.nio.Buffer#nextGetIndex()
1.00% 10633 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.96% 10209 org.apache.lucene.search.PhraseScorer$1#matches()
0.94% 9916 org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
0.93% 9836 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#nextPosition()
0.83% 8758 org.apache.lucene.index.SingletonSortedSetDocValues#nextOrd()
0.82% 8688 org.apache.lucene.codecs.lucene80.Lucene80NormsProducer$3#longValue()
0.80% 8465 jdk.internal.misc.Unsafe#copyMemory()
0.77% 8188 org.apache.lucene.search.ConjunctionDISI#doNext()
0.74% 7833 org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$DenseBinaryDocValues#advance()
0.71% 7481 org.apache.lucene.util.PriorityQueue#add()
0.66% 6981 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#skipPositions()
0.65% 6865 org.apache.lucene.search.spans.TermSpans#nextStartPosition()
0.63% 6682 org.apache.lucene.search.spans.NearSpansOrdered#stretchToOrder()
HEAP merged search profile for comp:
PROFILE SUMMARY from 28436 events (total: 25291M)
tests.profile.mode=heap
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT HEAP SAMPLES STACK
13.62% 3445M org.apache.lucene.util.FixedBitSet#<init>()
6.46% 1634M org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame#<init>()
6.16% 1557M java.util.AbstractList#iterator()
5.14% 1299M org.apache.lucene.util.BytesRef#<init>()
5.12% 1294M org.apache.lucene.util.ArrayUtil#growExact()
4.15% 1048M org.apache.lucene.util.fst.ByteSequenceOutputs#read()
4.01% 1014M org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.84% 719M org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockDocsEnum#<init>()
2.78% 702M org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
2.37% 598M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
2.00% 506M org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
1.86% 470M java.nio.DirectByteBufferR#duplicate()
1.85% 468M jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.85% 467M org.apache.lucene.store.DataInput#skipBytes()
1.83% 463M org.apache.lucene.codecs.blocktree.IntersectTermsEnumFrame#load()
1.55% 392M org.apache.lucene.codecs.blocktree.SegmentTermsEnum#getFrame()
1.37% 347M org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocsEnum#<init>()
1.27% 320M org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#newTermState()
1.23% 312M java.util.ArrayList#grow()
1.22% 308M org.apache.lucene.codecs.lucene84.ForUtil#<init>()
1.18% 298M java.util.ArrayList#iterator()
1.18% 298M org.apache.lucene.codecs.blocktree.SegmentTermsEnum#<init>()
1.07% 271M org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
1.07% 270M org.apache.lucene.codecs.blocktree.IntersectTermsEnumFrame#<init>()
0.93% 236M java.nio.DirectByteBufferR#asLongBuffer()
0.90% 226M org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#<init>()
0.89% 224M org.apache.lucene.store.ByteBufferIndexInput#newCloneInstance()
0.80% 201M java.util.Arrays#copyOf()
0.78% 196M java.nio.DirectByteBufferR#slice()
0.76% 191M org.apache.lucene.codecs.blocktree.SegmentTermsEnum#getArc()
Look at that 2.72% CPU in convEndian
and 2.15% in ByteBufferGuard#ensureValid
!
I pushed JFR for searching (still need to enable for indexing too), and it ran last night for the first time: https://home.apache.org/~mikemccand/lucenebench/2021.01.12.00.03.16.html
The results are ... curious. E.g. 39% of HEAP samples were for java.lang.Integer#valueOf()
!? Are we really boxing that many integers (where!).
6.4% of HEAP was jdk.internal.math.FloatingDecimal#readJavaFormatString()
!? I think this is parsing String to float/double, sheesh.
~2% of CPU is spent in java.nio.ByteBuffer#get()
-- are we really pulling single bytes so often!?
I will increase reported stack depth so we can get more details :)
OK I re-ran the *.jfr
aggregation, asking for stacksize=10
. I think we really must fix our grouping implementation to not box/unbox Integer
on every hit!
PERCENT HEAP SAMPLES STACK
39.19% 330167M java.lang.Integer#valueOf()
at org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
6.38% 53776M jdk.internal.math.FloatingDecimal#readJavaFormatString()
at jdk.internal.math.FloatingDecimal#parseFloat()
at java.lang.Float#parseFloat()
at perf.VectorDictionary#parseLine()
at perf.VectorDictionary#<init>()
at perf.TaskParser#<init>()
at perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
4.00% 33743M org.apache.lucene.util.FixedBitSet#<init>()
at org.apache.lucene.util.DocIdSetBuilder#upgradeToBitSet()
at org.apache.lucene.util.DocIdSetBuilder#grow()
at org.apache.lucene.util.DocIdSetBuilder#add()
at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1#rewrite()
at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1#bulkScorer()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
3.16% 26630M java.lang.StringLatin1#newString()
at java.lang.String#substring()
at java.lang.String#split()
at java.lang.String#split()
at perf.VectorDictionary#parseLine()
at perf.VectorDictionary#<init>()
at perf.TaskParser#<init>()
at perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
2.39% 20101M java.util.Arrays#copyOf()
at java.util.ArrayList#grow()
at java.util.ArrayList#grow()
at java.util.ArrayList#add()
at java.util.ArrayList#add()
at java.lang.String#split()
at java.lang.String#split()
at perf.VectorDictionary#parseLine()
at perf.VectorDictionary#<init>()
at perf.TaskParser#<init>()
2.28% 19242M java.util.Arrays#copyOfRange()
at java.lang.StringLatin1#newString()
at java.lang.String#substring()
at java.lang.String#split()
at java.lang.String#split()
at perf.VectorDictionary#parseLine()
at perf.VectorDictionary#<init>()
at perf.TaskParser#<init>()
at perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
1.88% 15839M org.apache.lucene.util.FixedBitSet#<init>()
at org.apache.lucene.util.DocIdSetBuilder#upgradeToBitSet()
at org.apache.lucene.util.DocIdSetBuilder#grow()
at org.apache.lucene.facet.FacetsCollector#collect()
at org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
1.73% 14598M java.lang.ref.Finalizer#runFinalization()
at java.lang.ref.Reference$1#runFinalization()
at java.lang.Runtime#runFinalization()
at perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
1.72% 14517M org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.ImpactsDISI#advanceTarget()
at org.apache.lucene.search.ImpactsDISI#advance()
at org.apache.lucene.search.ImpactsDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
1.26% 10575M java.util.AbstractList#listIterator()
at java.util.AbstractList$SubList$1#<init>()
at java.util.AbstractList$SubList#listIterator()
at java.util.AbstractList#listIterator()
at java.util.AbstractList$SubList#iterator()
at org.apache.lucene.search.DisiPriorityQueue#iterator()
at org.apache.lucene.search.WANDScorer#updateMaxScores()
at org.apache.lucene.search.WANDScorer#updateMaxScoresIfNecessary()
at org.apache.lucene.search.WANDScorer#moveToNextCandidate()
at org.apache.lucene.search.WANDScorer$1#advance()
0.90% 7559M java.lang.String#split()
at java.lang.String#split()
at perf.VectorDictionary#parseLine()
at perf.VectorDictionary#<init>()
at perf.TaskParser#<init>()
at perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
0.87% 7371M java.util.AbstractList#iterator()
at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.ImpactsDISI#advanceTarget()
at org.apache.lucene.search.ImpactsDISI#advance()
at org.apache.lucene.search.WANDScorer#advanceTail()
at org.apache.lucene.search.WANDScorer#advanceTail()
at org.apache.lucene.search.WANDScorer$2#matches()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
0.85% 7163M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.MaxScoreCache#getSkipLevel()
at org.apache.lucene.search.MaxScoreCache#getSkipUpTo()
at org.apache.lucene.search.ImpactsDISI#advanceTarget()
at org.apache.lucene.search.ImpactsDISI#advance()
at org.apache.lucene.search.ImpactsDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
0.82% 6901M org.apache.lucene.util.FixedBitSet#<init>()
at org.apache.lucene.search.PointRangeQuery$1$4#get()
at org.apache.lucene.search.PointRangeQuery$1#scorer()
at org.apache.lucene.search.Weight#bulkScorer()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
0.78% 6565M org.apache.lucene.util.FixedBitSet#<init>()
at org.apache.lucene.util.DocIdSetBuilder#upgradeToBitSet()
at org.apache.lucene.util.DocIdSetBuilder#grow()
at org.apache.lucene.search.PointRangeQuery$1$1#grow()
at org.apache.lucene.util.bkd.BKDReader#addAll()
at org.apache.lucene.util.bkd.BKDReader#intersect()
at org.apache.lucene.util.bkd.BKDReader#intersect()
at org.apache.lucene.util.bkd.BKDReader#intersect()
at org.apache.lucene.util.bkd.BKDReader#intersect()
at org.apache.lucene.util.bkd.BKDReader#intersect()
0.75% 6356M java.lang.StringUTF16#compress()
at java.lang.String#<init>()
at java.lang.String#<init>()
at java.io.BufferedReader#readLine()
at java.io.BufferedReader#readLine()
at perf.VectorDictionary#<init>()
at perf.TaskParser#<init>()
at perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
0.71% 6003M java.util.AbstractList#iterator()
at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.ImpactsDISI#getMaxScore()
at org.apache.lucene.search.TermScorer#getMaxScore()
at org.apache.lucene.search.WANDScorer#updateMaxScores()
at org.apache.lucene.search.WANDScorer#updateMaxScoresIfNecessary()
at org.apache.lucene.search.WANDScorer#moveToNextCandidate()
at org.apache.lucene.search.WANDScorer$1#advance()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
0.66% 5520M java.util.ArrayList#grow()
at java.util.ArrayList#grow()
at java.util.ArrayList#add()
at java.util.ArrayList#add()
at org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.ImpactsDISI#advanceTarget()
at org.apache.lucene.search.ImpactsDISI#advance()
at org.apache.lucene.search.ImpactsDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.61% 5154M java.util.concurrent.CopyOnWriteArrayList#iterator()
at jdk.jfr.internal.RequestEngine#run_requests()
at jdk.jfr.internal.RequestEngine#doPeriodic()
at jdk.jfr.internal.PlatformRecorder#periodicTask()
at jdk.jfr.internal.PlatformRecorder#lambda$startDiskMonitor$1()
at jdk.jfr.internal.PlatformRecorder$$Lambda$44+0x0000000800b74740.1164107853#run()
at java.lang.Thread#run()
0.58% 4906M org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockDocsEnum#<init>()
at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#postings()
at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#impacts()
at org.apache.lucene.codecs.blocktree.SegmentTermsEnum#impacts()
at org.apache.lucene.search.TermQuery$TermWeight#scorer()
at org.apache.lucene.search.Weight#scorerSupplier()
at org.apache.lucene.search.BooleanWeight#scorerSupplier()
at org.apache.lucene.search.BooleanWeight#scorer()
at org.apache.lucene.search.Weight#bulkScorer()
at org.apache.lucene.search.BooleanWeight#bulkScorer()
0.54% 4514M java.util.Arrays#asList()
at org.apache.lucene.search.DisiPriorityQueue#iterator()
at org.apache.lucene.search.WANDScorer#updateMaxScores()
at org.apache.lucene.search.WANDScorer#updateMaxScoresIfNecessary()
at org.apache.lucene.search.WANDScorer#moveToNextCandidate()
at org.apache.lucene.search.WANDScorer$1#advance()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advance()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.53% 4494M java.util.AbstractList#iterator()
at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.ImpactsDISI#getMaxScore()
at org.apache.lucene.search.TermScorer#getMaxScore()
at org.apache.lucene.search.BlockMaxConjunctionScorer#getMaxScore()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#moveToNextBlock()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advanceTarget()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advance()
0.52% 4416M java.util.AbstractList#iterator()
at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.ImpactsDISI#advanceTarget()
at org.apache.lucene.search.ImpactsDISI#advance()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advance()
at org.apache.lucene.search.BlockMaxConjunctionScorer$2#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
0.51% 4295M perf.VectorDictionary#parseLine()
at perf.VectorDictionary#<init>()
at perf.TaskParser#<init>()
at perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
0.50% 4241M jdk.internal.misc.Unsafe#allocateUninitializedArray()
at java.lang.StringConcatHelper#newArray()
at java.lang.StringConcatHelper#newArrayWithSuffix()
at java.lang.invoke.DirectMethodHandle$Holder#invokeStatic()
at java.lang.invoke.LambdaForm$MH+0x0000000800bb3040.1462044018#invoke()
at java.lang.invoke.LambdaForm$MH+0x0000000800ba6440.1536471117#invoke()
at java.lang.invoke.Invokers$Holder#linkToTargetMethod()
at org.apache.lucene.store.IndexInput#getFullSliceDescription()
at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
at org.apache.lucene.store.ByteBufferIndexInput#slice()
0.50% 4186M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
at org.apache.lucene.search.ImpactsDISI#advanceTarget()
at org.apache.lucene.search.ImpactsDISI#advance()
at org.apache.lucene.search.ImpactsDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
0.50% 4180M java.util.concurrent.locks.AbstractQueuedSynchronizer#acquire()
at java.util.concurrent.locks.AbstractQueuedSynchronizer#acquireSharedInterruptibly()
at java.util.concurrent.CountDownLatch#await()
at perf.TaskThreads$TaskThread#run()
0.49% 4099M org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
at org.apache.lucene.util.DocIdSetBuilder#addBuffer()
at org.apache.lucene.util.DocIdSetBuilder#ensureBufferCapacity()
at org.apache.lucene.util.DocIdSetBuilder#grow()
at org.apache.lucene.facet.FacetsCollector#collect()
at org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
0.48% 4072M perf.SearchPerfTest#_main()
at perf.SearchPerfTest#main()
0.44% 3723M org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
at org.apache.lucene.search.ImpactsDISI#advanceShallow()
at org.apache.lucene.search.ImpactsDISI#advanceTarget()
at org.apache.lucene.search.ImpactsDISI#advance()
at org.apache.lucene.search.ImpactsDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
And CPU:
PERCENT CPU SAMPLES STACK
3.79% 232348 org.apache.lucene.util.packed.DirectMonotonicReader#get()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
3.55% 217460 org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
2.70% 165557 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
at org.apache.lucene.util.packed.DirectMonotonicReader#get()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
2.13% 130744 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
1.95% 119568 org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
1.74% 106828 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#advance()
at org.apache.lucene.search.ConjunctionDISI#doNext()
at org.apache.lucene.search.ConjunctionDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
1.54% 94213 java.nio.ByteBuffer#get()
at java.nio.DirectByteBuffer#get()
at org.apache.lucene.store.ByteBufferGuard#getBytes()
at org.apache.lucene.store.ByteBufferIndexInput#readBytes()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
1.03% 63224 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
1.02% 62342 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
at org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
at org.apache.lucene.queries.intervals.IntervalScorer$1#matches()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
1.02% 62230 org.apache.lucene.search.PhraseScorer$1#matches()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
1.01% 61871 org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.93% 57187 org.apache.lucene.search.ConjunctionDISI#doNext()
at org.apache.lucene.search.ConjunctionDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
at org.apache.lucene.search.IndexSearcher#search()
0.82% 50023 org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.63% 38777 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.62% 37725 jdk.internal.misc.Unsafe#convEndian()
at jdk.internal.misc.Unsafe#getShortUnaligned()
at java.nio.DirectByteBuffer#getShort()
at java.nio.DirectByteBuffer#getShort()
at org.apache.lucene.store.ByteBufferGuard#getShort()
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readShort()
at org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
at org.apache.lucene.util.packed.DirectMonotonicReader#get()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
0.59% 36204 org.apache.lucene.search.TermScorer#score()
at org.apache.lucene.search.ScoreCachingWrappingScorer#score()
at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom()
at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive()
at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
0.56% 34042 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#findFirstGreater()
at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#advance()
at org.apache.lucene.search.ConjunctionDISI#doNext()
at org.apache.lucene.search.ConjunctionDISI#nextDoc()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
0.55% 33870 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
at org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.54% 33294 org.apache.lucene.store.ByteBufferGuard#ensureValid()
at org.apache.lucene.store.ByteBufferGuard#getShort()
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readShort()
at org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
at org.apache.lucene.util.packed.DirectMonotonicReader#get()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.53% 32211 org.apache.lucene.search.ScoreCachingWrappingScorer#score()
at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom()
at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive()
at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
0.50% 30812 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#skipPositions()
at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#nextPosition()
at org.apache.lucene.queries.intervals.TermIntervalsSource$1#nextInterval()
at org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
at org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
at org.apache.lucene.queries.intervals.IntervalScorer$1#matches()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
0.50% 30709 org.apache.lucene.search.grouping.BlockGroupingCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.49% 30235 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
at org.apache.lucene.search.LeafSimScorer#score()
at org.apache.lucene.search.TermScorer#score()
at org.apache.lucene.search.ScoreCachingWrappingScorer#score()
at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom()
at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive()
at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
0.48% 29554 org.apache.lucene.search.SloppyPhraseMatcher#maxFreq()
at org.apache.lucene.search.PhraseScorer$1#matches()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
at org.apache.lucene.search.IndexSearcher#search()
0.48% 29119 jdk.internal.misc.Unsafe#convEndian()
at jdk.internal.misc.Unsafe#getIntUnaligned()
at java.nio.DirectByteBuffer#getInt()
at java.nio.DirectByteBuffer#getInt()
at org.apache.lucene.store.ByteBufferGuard#getInt()
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readInt()
at org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$21#ordValue()
at org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect()
0.44% 26883 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$21#ordValue()
at org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.44% 26770 java.nio.Buffer#position()
at java.nio.ByteBuffer#position()
at java.nio.MappedByteBuffer#position()
at java.nio.MappedByteBuffer#position()
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.43% 26243 org.apache.lucene.codecs.lucene80.Lucene80NormsProducer$3#longValue()
at org.apache.lucene.search.LeafSimScorer#getNormValue()
at org.apache.lucene.search.LeafSimScorer#score()
at org.apache.lucene.search.TermScorer#score()
at org.apache.lucene.search.BlockMaxConjunctionScorer#score()
at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
0.42% 25962 org.apache.lucene.index.SingletonSortedSetDocValues#nextOrd()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
0.41% 25051 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
at org.apache.lucene.search.BulkScorer#score()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#search()
at org.apache.lucene.search.IndexSearcher#searchAfter()
at org.apache.lucene.search.IndexSearcher#search()
at perf.SearchTask#go()
at perf.TaskThreads$TaskThread#run()
The VectorDictionary parsing should be a one-time startup cost that is part of the luceneutil test harness - probably it's not worth optimizing? Although we could certainly produce a more compact easy-to-read dictionary representation if we want to.
On Tue, Jan 12, 2021 at 8:57 AM Michael McCandless notifications@github.com wrote:
And CPU:
PERCENT CPU SAMPLES STACK 3.79% 232348 org.apache.lucene.util.packed.DirectMonotonicReader#get() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#
() at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 3.55% 217460 org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 2.70% 165557 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() at org.apache.lucene.util.packed.DirectMonotonicReader#get() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 2.13% 130744 org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 1.95% 119568 org.apache.lucene.search.grouping.TermGroupSelector#advanceTo() at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 1.74% 106828 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#advance() at org.apache.lucene.search.ConjunctionDISI#doNext() at org.apache.lucene.search.ConjunctionDISI#nextDoc() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#searchAfter() 1.54% 94213 java.nio.ByteBuffer#get() at java.nio.DirectByteBuffer#get() at org.apache.lucene.store.ByteBufferGuard#getBytes() at org.apache.lucene.store.ByteBufferIndexInput#readBytes() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 1.03% 63224 org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 1.02% 62342 org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval() at org.apache.lucene.queries.intervals.IntervalFilter#nextInterval() at org.apache.lucene.queries.intervals.IntervalScorer$1#matches() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#searchAfter() 1.02% 62230 org.apache.lucene.search.PhraseScorer$1#matches() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#searchAfter() at org.apache.lucene.search.IndexSearcher#search() at perf.SearchTask#go() 1.01% 61871 org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.93% 57187 org.apache.lucene.search.ConjunctionDISI#doNext() at org.apache.lucene.search.ConjunctionDISI#nextDoc() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#searchAfter() at org.apache.lucene.search.IndexSearcher#search() 0.82% 50023 org.apache.lucene.search.grouping.TermGroupSelector#advanceTo() at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.63% 38777 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.62% 37725 jdk.internal.misc.Unsafe#convEndian() at jdk.internal.misc.Unsafe#getShortUnaligned() at java.nio.DirectByteBuffer#getShort() at java.nio.DirectByteBuffer#getShort() at org.apache.lucene.store.ByteBufferGuard#getShort() at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readShort() at org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() at org.apache.lucene.util.packed.DirectMonotonicReader#get() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() 0.59% 36204 org.apache.lucene.search.TermScorer#score() at org.apache.lucene.search.ScoreCachingWrappingScorer#score() at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom() at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive() at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() 0.56% 34042 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#findFirstGreater() at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#advance() at org.apache.lucene.search.ConjunctionDISI#doNext() at org.apache.lucene.search.ConjunctionDISI#nextDoc() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() 0.55% 33870 org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() at org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.54% 33294 org.apache.lucene.store.ByteBufferGuard#ensureValid() at org.apache.lucene.store.ByteBufferGuard#getShort() at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readShort() at org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() at org.apache.lucene.util.packed.DirectMonotonicReader#get() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.53% 32211 org.apache.lucene.search.ScoreCachingWrappingScorer#score() at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom() at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive() at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at perf.SearchTask#go() 0.50% 30812 org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#skipPositions() at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#nextPosition() at org.apache.lucene.queries.intervals.TermIntervalsSource$1#nextInterval() at org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval() at org.apache.lucene.queries.intervals.IntervalFilter#nextInterval() at org.apache.lucene.queries.intervals.IntervalScorer$1#matches() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() 0.50% 30709 org.apache.lucene.search.grouping.BlockGroupingCollector#collect() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.49% 30235 org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score() at org.apache.lucene.search.LeafSimScorer#score() at org.apache.lucene.search.TermScorer#score() at org.apache.lucene.search.ScoreCachingWrappingScorer#score() at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom() at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive() at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() 0.48% 29554 org.apache.lucene.search.SloppyPhraseMatcher#maxFreq() at org.apache.lucene.search.PhraseScorer$1#matches() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#searchAfter() at org.apache.lucene.search.IndexSearcher#search() 0.48% 29119 jdk.internal.misc.Unsafe#convEndian() at jdk.internal.misc.Unsafe#getIntUnaligned() at java.nio.DirectByteBuffer#getInt() at java.nio.DirectByteBuffer#getInt() at org.apache.lucene.store.ByteBufferGuard#getInt() at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readInt() at org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$21#ordValue() at org.apache.lucene.search.grouping.TermGroupSelector#advanceTo() at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect() 0.44% 26883 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$21#ordValue() at org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.44% 26770 java.nio.Buffer#position() at java.nio.ByteBuffer#position() at java.nio.MappedByteBuffer#position() at java.nio.MappedByteBuffer#position() at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll() at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.43% 26243 org.apache.lucene.codecs.lucene80.Lucene80NormsProducer$3#longValue() at org.apache.lucene.search.LeafSimScorer#getNormValue() at org.apache.lucene.search.LeafSimScorer#score() at org.apache.lucene.search.TermScorer#score() at org.apache.lucene.search.BlockMaxConjunctionScorer#score() at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect() at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() 0.42% 25962 org.apache.lucene.index.SingletonSortedSetDocValues#nextOrd() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll() at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts# () at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() 0.41% 25051 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() at org.apache.lucene.search.Weight$DefaultBulkScorer#score() at org.apache.lucene.search.BulkScorer#score() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#search() at org.apache.lucene.search.IndexSearcher#searchAfter() at org.apache.lucene.search.IndexSearcher#search() at perf.SearchTask#go() at perf.TaskThreads$TaskThread#run() — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mikemccand/luceneutil/issues/77#issuecomment-758671045, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHHUQMKUHMEZKUEHDQNOXLSZRILLANCNFSM4PRCRYEA .
Good point -- that is a one-time cost, amortized over the life of a service that then handles query traffic.
Can I somehow instruct JFR to NOT profile at startup, and then trigger (from within my Java program, not via external JMC or so) for JFR to start?
Mike, there are various ways to do this. For this case, maybe simply filter it out in the analysis? That way you have precise control. You can look at method / stacktrace / whatever to determine whatever logic you want, e.g. ignoring anything coming from perf.TaskParser#<init>()
or whatever is appropriate here to remove ghosts.
Similar stuff is already done in unit tests for the gradle epoll thread: https://github.com/apache/lucene-solr/blob/master/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java#L101
Another alternative (untested, based on my understanding): If you can start the thing programmatically in the spawned jvms, then don't pass all the options as -XXStartFlightRecording=
instead pass them as -XX:FlightRecorderOptions=
.
In order to "start it" you then need to issue a diagnostic command. Usually done with jcmd
executable, but i found some snippets here that show how to send the diagnostic with mbean: https://github.com/jmiettinen/jfr/blob/master/src/main/java/fi/eonwe/JFR.java
Last night's benchmarks ran with indexer profiling too:
Profiler for cpu
PROFILE SUMMARY from 471699 events (total: 471699)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=5
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
6.72% 31682 org.apache.lucene.util.ByteBlockPool#setBytesRef()
at org.apache.lucene.util.BytesRefHash#equals()
at org.apache.lucene.util.BytesRefHash#findHash()
at org.apache.lucene.util.BytesRefHash#add()
at org.apache.lucene.index.TermsHashPerField#add()
6.63% 31278 org.apache.lucene.util.VectorUtil#dotProduct()
at org.apache.lucene.index.VectorValues$SearchStrategy#compare()
at org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
6.19% 29192 org.apache.lucene.analysis.standard.StandardTokenizerImpl#getNextToken()
at org.apache.lucene.analysis.standard.StandardTokenizer#incrementToken()
at org.apache.lucene.analysis.LowerCaseFilter#incrementToken()
at org.apache.lucene.analysis.FilteringTokenFilter#incrementToken()
at org.apache.lucene.index.IndexingChain$PerField#invert()
5.22% 24602 org.apache.lucene.index.TermsHashPerField#writeByte()
at org.apache.lucene.index.TermsHashPerField#writeVInt()
at org.apache.lucene.index.FreqProxTermsWriterPerField#addTerm()
at org.apache.lucene.index.TermsHashPerField#positionStreamSlice()
at org.apache.lucene.index.TermsHashPerField#add()
4.17% 19684 org.apache.lucene.util.BytesRefHash#findHash()
at org.apache.lucene.util.BytesRefHash#add()
at org.apache.lucene.index.TermsHashPerField#add()
at org.apache.lucene.index.IndexingChain$PerField#invert()
at org.apache.lucene.index.IndexingChain#processField()
2.75% 12994 java.io.BufferedOutputStream#write()
at org.apache.lucene.store.OutputStreamIndexOutput#writeByte()
at org.apache.lucene.store.RateLimitedIndexOutput#writeByte()
at org.apache.lucene.store.DataOutput#writeInt()
at org.apache.lucene.store.DataOutput#writeLong()
2.70% 12721 java.util.Arrays#binarySearch0()
at java.util.Arrays#binarySearch()
at org.apache.lucene.codecs.VectorWriter$VectorValuesMerger$MergerRandomAccess#vectorValue()
at org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
2.62% 12344 org.apache.lucene.analysis.standard.StandardAnalyzer#lambda$createComponents$0()
at org.apache.lucene.analysis.standard.StandardAnalyzer$$Lambda$229+0x0000000800c25f98.1329879585#accept()
at org.apache.lucene.analysis.Analyzer$TokenStreamComponents#setReader()
at org.apache.lucene.analysis.Analyzer#tokenStream()
at org.apache.lucene.document.Field#tokenStream()
2.42% 11399 sun.nio.ch.FileDispatcherImpl#write0()
at sun.nio.ch.FileDispatcherImpl#write()
at sun.nio.ch.IOUtil#writeFromNativeBuffer()
at sun.nio.ch.IOUtil#write()
at sun.nio.ch.FileChannelImpl#write()
2.18% 10286 org.apache.lucene.index.TermsHashPerField#writeByte()
at org.apache.lucene.index.TermsHashPerField#writeVInt()
at org.apache.lucene.index.FreqProxTermsWriterPerField#writeProx()
at org.apache.lucene.index.FreqProxTermsWriterPerField#addTerm()
at org.apache.lucene.index.TermsHashPerField#positionStreamSlice()
1.92% 9072 org.apache.lucene.util.LongHeap#push()
at org.apache.lucene.util.hnsw.NeighborQueue#add()
at org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
1.92% 9055 org.apache.lucene.index.FreqProxTermsWriterPerField#addTerm()
at org.apache.lucene.index.TermsHashPerField#positionStreamSlice()
at org.apache.lucene.index.TermsHashPerField#add()
at org.apache.lucene.index.IndexingChain$PerField#invert()
at org.apache.lucene.index.IndexingChain#processField()
1.65% 7762 java.io.BufferedOutputStream#write()
at org.apache.lucene.store.OutputStreamIndexOutput#writeByte()
at org.apache.lucene.store.DataOutput#writeInt()
at org.apache.lucene.store.DataOutput#writeLong()
at org.apache.lucene.codecs.lucene84.ForUtil#encode()
1.57% 7390 org.apache.lucene.index.VectorValues$SearchStrategy#compare()
at org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeGraph()
1.56% 7358 org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeGraph()
at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeField()
and:
Profiler for heap
PROFILE SUMMARY from 134203 events (total: 493457M)
tests.profile.mode=heap
tests.profile.count=30
tests.profile.stacksize=5
tests.profile.linenumbers=false
PERCENT HEAP SAMPLES STACK
27.11% 133792M org.apache.lucene.util.fst.BytesStore#writeByte()
at org.apache.lucene.util.fst.FST#<init>()
at org.apache.lucene.util.fst.FSTCompiler#<init>()
at org.apache.lucene.util.fst.FSTCompiler$Builder#build()
at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock#compileIndex()
9.92% 48968M perf.LineFileDocs#nextDoc()
at perf.IndexThreads$IndexThread#run()
7.29% 35963M java.lang.String#<init>()
at java.lang.String#<init>()
at perf.LineFileDocs#nextDoc()
at perf.IndexThreads$IndexThread#run()
5.76% 28415M org.apache.lucene.util.SparseFixedBitSet#insertLong()
at org.apache.lucene.util.SparseFixedBitSet#set()
at org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
5.16% 25465M java.lang.StringUTF16#compress()
at java.lang.String#<init>()
at java.lang.String#<init>()
at perf.LineFileDocs#nextDoc()
at perf.IndexThreads$IndexThread#run()
5.08% 25073M perf.LineFileDocs#readDocs()
at perf.LineFileDocs$1#run()
4.45% 21942M org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter#newTermState()
at org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter#newTermState()
at org.apache.lucene.codecs.PushPostingsWriterBase#writeTerm()
at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter#write()
at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter#write()
4.10% 20237M org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#<init>()
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#newInstance()
at org.apache.lucene.index.ParallelPostingsArray#grow()
at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray#grow()
at org.apache.lucene.util.BytesRefHash#add()
2.75% 13555M org.apache.lucene.util.ArrayUtil#growExact()
at org.apache.lucene.util.ArrayUtil#grow()
at org.apache.lucene.util.LongHeap#push()
at org.apache.lucene.util.hnsw.NeighborQueue#add()
at org.apache.lucene.util.hnsw.HnswGraph#search()
2.68% 13211M org.apache.lucene.index.ParallelPostingsArray#<init>()
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#<init>()
at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#newInstance()
at org.apache.lucene.index.ParallelPostingsArray#grow()
at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray#grow()
2.02% 9959M org.apache.lucene.util.ByteBlockPool$DirectTrackingAllocator#getByteBlock()
at org.apache.lucene.util.ByteBlockPool#nextBuffer()
at org.apache.lucene.util.ByteBlockPool#allocSlice()
at org.apache.lucene.index.TermsHashPerField#writeByte()
at org.apache.lucene.index.TermsHashPerField#writeVInt()
1.27% 6283M org.apache.lucene.util.SparseFixedBitSet#insertBlock()
at org.apache.lucene.util.SparseFixedBitSet#set()
at org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
1.15% 5667M org.apache.lucene.util.BytesRef#<init>()
at org.apache.lucene.store.ByteBuffersDataOutput#writeString()
at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter#writeField()
at org.apache.lucene.index.StoredFieldsConsumer#writeField()
at org.apache.lucene.index.IndexingChain#processField()
1.13% 5567M org.apache.lucene.codecs.lucene80.Lucene80NormsProducer#getNorms()
at org.apache.lucene.codecs.PushPostingsWriterBase#writeTerm()
at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter#write()
at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter#write()
at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter#write()
1.02% 5030M org.apache.lucene.util.SparseFixedBitSet#<init>()
at org.apache.lucene.util.hnsw.HnswGraph#search()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeGraph()
It's curious that building FSTs is the top allocator (by sampling)! And, that SparseFixedBitSet.get/insertLong
is such a hot spot!
I will fix nightly benchmark to include these profiler outputs in the accessible nightly log.
Hi, I just came across this issue, and as you were discussing JFR, I thought the JfrUnit project might be useful here. It lets you run assertions against JFR events, allowing you to identify regressions e.g. in terms of object allocations or I/O triggered by specific operations. Of course, you could also use it to assert custom JFR events (which in turn you could inject via the JMC Agent project, in case there's no suitable events matching your needs).
There are many interesting counters that you get from perf stat
, e.g. running perf stat -dd ls -l
:
Performance counter stats for 'ls -l':
3.23 msec task-clock:u # 0.795 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
133 page-faults:u # 41.163 K/sec
1,984,270 cycles:u # 0.614 GHz (2.09%)
693,680 stalled-cycles-frontend:u # 34.96% frontend cycles idle (79.96%)
2,309,178 instructions:u # 1.16 insn per cycle
# 0.30 stalled cycles per insn
459,923 branches:u # 142.344 M/sec
19,910 branch-misses:u # 4.33% of all branches
957,775 L1-dcache-loads:u # 296.427 M/sec
20,642 L1-dcache-load-misses:u # 2.16% of all L1-dcache accesses (97.91%)
<not supported> LLC-loads:u
<not supported> LLC-load-misses:u
647,031 L1-icache-loads:u # 200.253 M/sec (20.04%)
<not counted> L1-icache-load-misses:u (0.00%)
<not counted> dTLB-loads:u (0.00%)
<not counted> dTLB-load-misses:u (0.00%)
<not counted> iTLB-loads:u (0.00%)
<not counted> iTLB-load-misses:u (0.00%)
0.004062988 seconds time elapsed
0.000000000 seconds user
0.003514000 seconds sys
Brendan Gregg has lots of cool examples of how to do this, e.g.: https://www.brendangregg.com/perf.html ... and his Netflix blog posts.
OK I made a baby step here: if perf
seems to exist then we spawn the JVMs using perf stat -dd
prefix. It produces fun output like this, which is saved away into nightly logs (so we could back-process in the future) but otherwise not parsed yet:
Performance counter stats for '/usr/lib/jvm/java-21-openjdk/bin/java --add-modules jdk.incubator.vector -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp -XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/l\
/util/src/python/profiling.jfc,filename=/l/logs/bench-search-atoa-base2-0.jfr -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath /l/trunk/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/l/trunk/lucene/sandbox/build\
/classes/java/main:/l/trunk/lucene/misc/build/classes/java/main:/l/trunk/lucene/facet/build/classes/java/main:/l/trunk/lucene/analysis/common/build/classes/java/main:/l/trunk/lucene/analysis/icu/build/classes/java/main:/l/trunk/lucene/qu\
eryparser/build/classes/java/main:/l/trunk/lucene/grouping/build/classes/java/main:/l/trunk/lucene/suggest/build/classes/java/main:/l/trunk/lucene/highlighter/build/classes/java/main:/l/trunk/lucene/codecs/build/classes/java/main:/l/trun\
k/lucene/queries/build/classes/java/main:/l/trunk/lucene/join/build/classes/java/main:/l/util/lib/HdrHistogram.jar:/l/util/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /l/indices/wikimediumall.trunk.facets.taxonomy:Date.ta\
xonomy:Month.taxonomy:DayOfYear.taxonomy:RandomLabel.taxonomy.sortedset:Month.sortedset:DayOfYear.sortedset:RandomLabel.sortedset.Lucene90.Lucene99.dvfields.nd27.625M -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets taxono\
my:DayOfYear;DayOfYear -facets taxonomy:RandomLabel.taxonomy;RandomLabel -facets sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -facets sortedset:RandomLabel.sortedset;RandomLabel -analyzer StandardAnalyzerNoStopWords -taskS\
ource /l/util/3992009.tasks -numConcurrentQueries 6 -taskRepeatCount 200 -field body -tasksPerCat 1 -searchConcurrency 0 -staticSeed -6719702 -seed -2580077 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log \
/l/logs/atoa.base2.0 -topN 100':
30,700.97 msec task-clock:u # 5.749 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
71,689 page-faults:u # 2.335 K/sec
117,794,137,105 cycles:u # 3.837 GHz (47.51%)
12,006,779,240 stalled-cycles-frontend:u # 10.19% frontend cycles idle (47.57%)
139,806,618,209 instructions:u # 1.19 insn per cycle
# 0.09 stalled cycles per insn (47.59%)
29,683,678,719 branches:u # 966.865 M/sec (47.55%)
644,231,829 branch-misses:u # 2.17% of all branches (46.98%)
61,901,518,938 L1-dcache-loads:u # 2.016 G/sec (45.99%)
1,067,103,787 L1-dcache-load-misses:u # 1.72% of all L1-dcache accesses (45.68%)
<not supported> LLC-loads:u
<not supported> LLC-load-misses:u
6,984,642,008 L1-icache-loads:u # 227.506 M/sec (45.96%)
85,868,921 L1-icache-load-misses:u # 1.23% of all L1-icache accesses (46.19%)
211,352,608 dTLB-loads:u # 6.884 M/sec (46.31%)
46,297,044 dTLB-load-misses:u # 21.91% of all dTLB cache accesses (46.31%)
31,526,035 iTLB-loads:u # 1.027 M/sec (46.72%)
2,074,199 iTLB-load-misses:u # 6.58% of all iTLB cache accesses (47.25%)
5.340673185 seconds time elapsed
29.104880000 seconds user
1.328313000 seconds sys
I think the benchmark results would be more useful if we could see some basic analysis of the code under load. We could imagine summary tables, graphs, flame charts, whatever is needed to knock out the hotspots and keep them under control.
Instead of seeing just that performance dropped or improved, we might be able to reason about why: it could make things better for developers. And maybe, just maybe, we have some giant frogs boiling for years that nobody never noticed yet: you know, easy wins to improve perf that went undetected.
As a start I imagine breaking the problem down into two profiling mechamisms: