Open bbarani opened 1 year ago
Here's a 3 month view with a cropped y axis:
It's interesting that we saw a performance boost in early April that was never explained, and now it appears we may be returning to the previous baseline.
Are these graphs (and in general, perf runs) available somewhere (in public)?
@reta The above graphs were generated using internal runs. We are doing the final security review before surfacing the public performance dashboard. We should have some updates by this week. The public dashboard wont have historical data though.
A little more context here from previous discussions:
On April 12 we saw a big jump in performance on 3.0:
The most closely correlated commit (based on the timeline) was this ImmutableOpenMap change from @nknize. However, that was backported to the 2.x branch and we never saw the same performance gain.
One thing to note about the current change is that the 3.0.0 distribution builds are currently broken, and the last successful build is from May 17, at this commit. All the performance runs after May 15 in the above graphs are built from that same commit.
Are these graphs (and in general, perf runs) available somewhere (in public)?
@reta The performance benchmarking page is live now at - http://opensearch.org/benchmarks
@reta The performance benchmarking page is live now at - http://opensearch.org/benchmarks
@bbarani this is awesome, thank you, worth announcement on Slack! :rocket:
This is a really good addition, thanks @bbarani and team. I think it might be worth while to integrate https://github.com/async-profiler/async-profiler as well this will help us better understand code paths that cause additional CPU cycles or creates more garbage
@Bukhtawar Sure, we will look in to it. @rishabh6788
In the meantime, we are looking to add out of the box telemetry devices as part of the OpenSearch benchmark tool as well.
opensearch-benchmark list telemetry
____ _____ __ ____ __ __
/ __ \____ ___ ____ / ___/___ ____ ___________/ /_ / __ )___ ____ _____/ /_ ____ ___ ____ ______/ /__
/ / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \ / __ / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ / __/ / / /__/ / __/ /_/ / / / /__/ / / / / /_/ / __/ / / / /__/ / / / / / / / / /_/ / / / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/ \___/_/ /_/ /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/ /_/|_|
/_/
Available telemetry devices:
Command Name Description
-------------------------- -------------------------- --------------------------------------------------------------------------------------------------------------------------------
jit JIT Compiler Profiler Enables JIT compiler logs.
gc GC log Enables GC logs.
jfr Flight Recorder Enables Java Flight Recorder (requires an Oracle JDK or OpenJDK 11+)
heapdump Heap Dump Captures a heap dump.
node-stats Node Stats Regularly samples node stats
recovery-stats Recovery Stats Regularly samples shard recovery stats
ccr-stats CCR Stats Regularly samples Cross Cluster Replication (CCR) leader and follower(s) checkpoint at index leveland calculates replication lag
segment-stats Segment Stats Determines segment stats at the end of the benchmark.
transform-stats Transform Stats Regularly samples transform stats
searchable-snapshots-stats Searchable Snapshots Stats Regularly samples searchable snapshots stats
Keep in mind that each telemetry device may incur a runtime overhead which can skew results.
@bbarani - From https://opensearch.org/benchmarks, I see the mean indexing throughput for arm64 is above 100k for last one month. Is this issue still exists?
Describe the bug I notice a relatively significant drop (~10%) in indexing performance in main branch for HTTP Logs workload possibly due to some changes introduced between May 16 2023 to May 18 2023. The mean indexing throughput was around 100k before the change but has come down to ~90k.
Additional context Add any other context about the problem here.