Understand/Improve the performance of range queries

sandeshkr419 commented 7 months ago

I have been trying to understand and improve the performance of range queries in OpenSearch. For this purpose, I have setup single node cluster, and ingested nyc_taxis dataset.

Setup:

Single node 2.11 domain hosted via opensearch-cluster-cdk
Ran http-logs workload via OSB to index data one-time

Ran the query tasks while simultaneously running async-profiler on local node

opensearch-benchmark execute-test --pipeline=benchmark-only --workload=http_logs --target-host=$dom  --kill-running-processes --include-tasks=range

While running the above query load request, I collected the following cpu flamegraph:

Trying to further understand how can we reduce the visitDocIDs() cost for range query requests.

Other OSB results/metrics for further comparison:

|                                                  Segment count |        |          41 |        |
|                                                 Min Throughput |  range |           1 |  ops/s |
|                                                Mean Throughput |  range |           1 |  ops/s |
|                                              Median Throughput |  range |           1 |  ops/s |
|                                                 Max Throughput |  range |           1 |  ops/s |
|                                        50th percentile latency |  range |     321.006 |     ms |
|                                        90th percentile latency |  range |     324.813 |     ms |
|                                        99th percentile latency |  range |     356.034 |     ms |
|                                       100th percentile latency |  range |     368.324 |     ms |
|                                   50th percentile service time |  range |     318.727 |     ms |
|                                   90th percentile service time |  range |     321.749 |     ms |
|                                   99th percentile service time |  range |     354.215 |     ms |
|                                  100th percentile service time |  range |     365.401 |     ms |
|                                                     error rate |  range |           0 |      % |

sandeshkr419 commented 6 months ago

Analyzing the profiling results, it looks like that readInts24 & readDelta16 are the major contributors to CPU cycles.

One optimization regarding reducing the number of reads with readInts24() was tried in https://github.com/opensearch-project/OpenSearch/issues/9541 but the run with nyc_taxi workload did not yield significant improvement.

For time being, it looks like the improvements in range queries will be dependent dominantly by improvements in Lucene - specifically in their methods for readInt(s) if possible, unless we figure more areas of improvements.

Will update further if I discover more actionable optimizations. Welcoming feedback from the OpenSearch community as well.

getsaurabh02 commented 4 months ago

@sandeshkr419 do we have more actionable learnings to call out or discuss here?

opensearch-project / OpenSearch

Understand/Improve the performance of range queries #11251