opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.01k stars 1.67k forks source link

Understand/Improve the performance of range queries #11251

Open sandeshkr419 opened 7 months ago

sandeshkr419 commented 7 months ago

I have been trying to understand and improve the performance of range queries in OpenSearch. For this purpose, I have setup single node cluster, and ingested nyc_taxis dataset.

Setup:

While running the above query load request, I collected the following cpu flamegraph:

Screenshot 2023-11-16 at 11 21 54 PM

Trying to further understand how can we reduce the visitDocIDs() cost for range query requests.

Other OSB results/metrics for further comparison:

|                                                  Segment count |        |          41 |        |
|                                                 Min Throughput |  range |           1 |  ops/s |
|                                                Mean Throughput |  range |           1 |  ops/s |
|                                              Median Throughput |  range |           1 |  ops/s |
|                                                 Max Throughput |  range |           1 |  ops/s |
|                                        50th percentile latency |  range |     321.006 |     ms |
|                                        90th percentile latency |  range |     324.813 |     ms |
|                                        99th percentile latency |  range |     356.034 |     ms |
|                                       100th percentile latency |  range |     368.324 |     ms |
|                                   50th percentile service time |  range |     318.727 |     ms |
|                                   90th percentile service time |  range |     321.749 |     ms |
|                                   99th percentile service time |  range |     354.215 |     ms |
|                                  100th percentile service time |  range |     365.401 |     ms |
|                                                     error rate |  range |           0 |      % |
sandeshkr419 commented 6 months ago

Analyzing the profiling results, it looks like that readInts24 & readDelta16 are the major contributors to CPU cycles.

One optimization regarding reducing the number of reads with readInts24() was tried in https://github.com/opensearch-project/OpenSearch/issues/9541 but the run with nyc_taxi workload did not yield significant improvement.

For time being, it looks like the improvements in range queries will be dependent dominantly by improvements in Lucene - specifically in their methods for readInt(s) if possible, unless we figure more areas of improvements.

Will update further if I discover more actionable optimizations. Welcoming feedback from the OpenSearch community as well.

getsaurabh02 commented 4 months ago

@sandeshkr419 do we have more actionable learnings to call out or discuss here?