opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.41k stars 1.72k forks source link

Improve search performance for numeric sort queries #10867

Open arjunkumargiri opened 10 months ago

arjunkumargiri commented 10 months ago

Numeric sorting is one of the key query mechanisms used across multiple OpenSearch clusters. It is critical to understand performance characteristics of numeric sorting queries and identify mechanisms to reduce query latency and reduce performance overhead.

To understand query characteristics of numeric sorting, a simple performance testing was performed with below settings:

Benchmark tool: opensearch-benchmark Workload: geonames Task: desc_sort_population Nodes: 1 node JVM size: 4 GB

Benchmark result: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Metric | Value | Unit -- | -- | -- Min Throughput | 70.09 | ops/s Mean Throughput | 74.27 | ops/s Median Throughput | 74.59 | ops/s Max Throughput | 74.79 | ops/s 50th percentile latency | 6.32264 | ms 90th percentile latency | 6.81483 | ms 99th percentile latency | 7.45008 | ms 99.9th percentile latency | 15.4041 | ms 100th percentile latency | 16.5634 | ms 50th percentile service time | 5.55759 | ms 90th percentile service time | 5.73615 | ms 99th percentile service time | 6.43827 | ms 99.9th percentile service time | 14.8174 | ms 100th percentile service time | 15.4417 | ms error rate | 0 | %

CPU profile:

Numeric sorting CPU profile

As expected most CPU cycles for numeric sorting is spent in Long comparator to do perform sorting operation. CPU cycles are equally distributed between PointValues operations estimatePointCount and intersect

Opening this issue to brainstorm and identify potential improvements to numeric sorting.

msfroh commented 10 months ago

I was brainstorming with @harshavamsi on this one briefly last week.

I think there might be some trickery that we can do especially for the special case where a segment has no deletes.

Specifically, I'm wondering if we can inspect the BKD tree to find the leftmost/rightmost (depending on sort order) smallest range with at least N hits, where N is the size parameter (or the track_total_hits limit). Then we could implicitly attach a range query filter.

I don't know if it would ultimately help, or if it's essentially what happens in the the PointValues estimate/intersect methods anyway.

harshavamsi commented 10 months ago

@msfroh thanks for the inputs. Tagging @rishabhmaurya here as well.

@rishabhmaurya had the idea of essentially trying to help match_all queries that use a descending sort on a numeric field. Rather than going through the entire BKD tree like you mentioned, we could essentially look through the min/max value that makes the most sense for us and then attach a range filter on that node assuming other attributes like the number of hits and the number of docs to be returned are all taken care of first.

I don't think I did a great job of explaining, but I will put up an RFC with my thought process and how we could prune the tree.

rishabhmaurya commented 10 months ago

Thanks @harshavamsi for working on it.

I have working version of it in lucene and details of optimization are mentioned here - https://github.com/apache/lucene/issues/12534 and PR https://github.com/rishabhmaurya/lucene/pull/2. I had a discussion around it with @msfroh and we agreed upon its utility. We can take take early feedback from @nknize as he understands this part of code very well.

I started making changes in opensearch as well because lucene community may not accept it as it works for cases with MatchAllQuery with desc sort and no deletions on numeric field. You can find the opensearch changes here, its still work in progress - https://github.com/rishabhmaurya/OpenSearch/commit/f261cb380dce345c4ce3671a814a12e67258fff5

getsaurabh02 commented 10 months ago

Should we pull this in https://github.com/rishabhmaurya/OpenSearch/commit/f261cb380dce345c4ce3671a814a12e67258fff5 and run a benchmark along with profile to identify the early improvements cc: @sandeshkr419

rishabhmaurya commented 10 months ago

https://github.com/rishabhmaurya/OpenSearch/commit/f261cb380dce345c4ce3671a814a12e67258fff5 is still work in progress so can't be used directly. Although, we can build custom lucene jar using - https://github.com/apache/lucene/issues/12534 where I have the changes working and check for the estimates on gains . We may have to tweak with entry condition here - https://github.com/rishabhmaurya/lucene/pull/2/files#diff-79c6a57519ecd1ef504629e62e13d17859a4ffedc58f4602e583ce758a15adc8R294 to find the sweet spot for this optimization.

harshavamsi commented 10 months ago

Current steps on this:

harshavamsi commented 10 months ago

Preliminary benchmarking results:

Without optimization Metric Value Unit
Min Throughput 1.5 ops/s
Mean Throughput 1.51 ops/s
Median Throughput 1.51 ops/s
Max Throughput 1.51 ops/s
50th percentile latency 6.23599 ms
90th percentile latency 6.81445 ms
99th percentile latency 7.21335 ms
100th percentile latency 7.22365 ms
50th percentile service time 4.63105 ms
90th percentile service time 5.02198 ms
99th percentile service time 5.20355 ms
100th percentile service time 5.24069 ms
error rate 0 %
With optimization Metric Value Unit
Min Throughput 1.5 ops/s
Mean Throughput 1.5 ops/s
Median Throughput 1.5 ops/s
Max Throughput 1.5 ops/s
50th percentile latency 8.20805 ms
90th percentile latency 8.61225 ms
99th percentile latency 8.91156 ms
100th percentile latency 9.02062 ms
50th percentile service time 6.5675 ms
90th percentile service time 6.76763 ms
99th percentile service time 7.00944 ms
100th percentile service time 7.10608 ms
error rate 0 %
rishabhmaurya commented 10 months ago

thanks @harshavamsi for running the benchmark. Could you provide more details on the workload and queries you ran?

harshavamsi commented 10 months ago

@rishabhmaurya

I ran this workload and this task:

Workload: geonames Task: desc_sort_population

I used an r5.2xlarge cluster for both benchmarks. The non optimized run was a regular cluster I had set up to run keyword benchmarking. The optimized cluster was running a custom build of OS with a patched lucene version that included the optimization.

This is the query:

    {
      "name": "desc_sort_population",
      "operation-type": "search",
      "body": {
        "query": {
          "match_all": {}
        },
        "sort" : [
          {"population" : "desc"}
        ]
      }
    },
harshavamsi commented 10 months ago

Re-running the benchmark on the optimized cluster:

|                                                 Min Throughput | desc_sort_population |         1.5 |  ops/s |
|                                                Mean Throughput | desc_sort_population |         1.5 |  ops/s |
|                                              Median Throughput | desc_sort_population |         1.5 |  ops/s |
|                                                 Max Throughput | desc_sort_population |         1.5 |  ops/s |
|                                        50th percentile latency | desc_sort_population |     6.71526 |     ms |
|                                        90th percentile latency | desc_sort_population |     7.17203 |     ms |
|                                        99th percentile latency | desc_sort_population |     7.40734 |     ms |
|                                       100th percentile latency | desc_sort_population |     7.46786 |     ms |
|                                   50th percentile service time | desc_sort_population |     5.15482 |     ms |
|                                   90th percentile service time | desc_sort_population |     5.38515 |     ms |
|                                   99th percentile service time | desc_sort_population |     5.79006 |     ms |
|                                  100th percentile service time | desc_sort_population |     5.89911 |     ms |
|                                                     error rate | desc_sort_population |           0 |      % |
rishabhmaurya commented 10 months ago

can you also post the segment stats here and overall index size. Given the latency is already pretty low, this maybe not be the right workload to test against.

harshavamsi commented 10 months ago

@rishabhmaurya here's the segment stats:

{
    "_shards": {
        "total": 7,
        "successful": 6,
        "failed": 0
    },
    "indices": {
        "geonames": {
            "shards": {
                "0": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 16,
                        "num_search_segments": 16,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 10535,
                                "deleted_docs": 0,
                                "size_in_bytes": 3746696,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 10070,
                                "deleted_docs": 0,
                                "size_in_bytes": 3459472,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 47250,
                                "deleted_docs": 0,
                                "size_in_bytes": 15707588,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 23435,
                                "deleted_docs": 0,
                                "size_in_bytes": 8124605,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 111695,
                                "deleted_docs": 0,
                                "size_in_bytes": 31826890,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 59986,
                                "deleted_docs": 0,
                                "size_in_bytes": 18805519,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 18956,
                                "deleted_docs": 0,
                                "size_in_bytes": 6143059,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 488811,
                                "deleted_docs": 0,
                                "size_in_bytes": 127081465,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 545075,
                                "deleted_docs": 0,
                                "size_in_bytes": 139838084,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 175324,
                                "deleted_docs": 0,
                                "size_in_bytes": 48162652,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 10813,
                                "deleted_docs": 0,
                                "size_in_bytes": 2925647,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 161960,
                                "deleted_docs": 0,
                                "size_in_bytes": 38022913,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 52153,
                                "deleted_docs": 0,
                                "size_in_bytes": 13055539,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 223779,
                                "deleted_docs": 0,
                                "size_in_bytes": 55202061,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 272247,
                                "deleted_docs": 0,
                                "size_in_bytes": 66167512,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 66286,
                                "deleted_docs": 0,
                                "size_in_bytes": 17067734,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "1": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 17,
                        "num_search_segments": 17,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 16775,
                                "deleted_docs": 0,
                                "size_in_bytes": 5516801,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 7823,
                                "deleted_docs": 0,
                                "size_in_bytes": 2994373,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 1479,
                                "deleted_docs": 0,
                                "size_in_bytes": 612107,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 42308,
                                "deleted_docs": 0,
                                "size_in_bytes": 13547068,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 40185,
                                "deleted_docs": 0,
                                "size_in_bytes": 13661412,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 6463,
                                "deleted_docs": 0,
                                "size_in_bytes": 2599313,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 73610,
                                "deleted_docs": 0,
                                "size_in_bytes": 21328234,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 120275,
                                "deleted_docs": 0,
                                "size_in_bytes": 34799549,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 23483,
                                "deleted_docs": 0,
                                "size_in_bytes": 7484546,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 496505,
                                "deleted_docs": 0,
                                "size_in_bytes": 129677362,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 431367,
                                "deleted_docs": 0,
                                "size_in_bytes": 112317590,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 153711,
                                "deleted_docs": 0,
                                "size_in_bytes": 42394841,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 64727,
                                "deleted_docs": 0,
                                "size_in_bytes": 15216055,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 3895,
                                "deleted_docs": 0,
                                "size_in_bytes": 1048412,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 214024,
                                "deleted_docs": 0,
                                "size_in_bytes": 53305902,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 500718,
                                "deleted_docs": 0,
                                "size_in_bytes": 118285309,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_g": {
                                "generation": 16,
                                "num_docs": 84258,
                                "deleted_docs": 0,
                                "size_in_bytes": 21819490,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "2": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 17,
                        "num_search_segments": 17,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 18219,
                                "deleted_docs": 0,
                                "size_in_bytes": 6406307,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 14097,
                                "deleted_docs": 0,
                                "size_in_bytes": 4702834,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 1766,
                                "deleted_docs": 0,
                                "size_in_bytes": 801728,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 24677,
                                "deleted_docs": 0,
                                "size_in_bytes": 8677890,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 66197,
                                "deleted_docs": 0,
                                "size_in_bytes": 20670999,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 8773,
                                "deleted_docs": 0,
                                "size_in_bytes": 3353062,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 140084,
                                "deleted_docs": 0,
                                "size_in_bytes": 38727079,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 102668,
                                "deleted_docs": 0,
                                "size_in_bytes": 29354176,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 11886,
                                "deleted_docs": 0,
                                "size_in_bytes": 3646252,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 481359,
                                "deleted_docs": 0,
                                "size_in_bytes": 124498298,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 420947,
                                "deleted_docs": 0,
                                "size_in_bytes": 110980771,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 122864,
                                "deleted_docs": 0,
                                "size_in_bytes": 33941196,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 55618,
                                "deleted_docs": 0,
                                "size_in_bytes": 13156027,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 28840,
                                "deleted_docs": 0,
                                "size_in_bytes": 7174693,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_e": {
                                "generation": 14,
                                "num_docs": 493797,
                                "deleted_docs": 0,
                                "size_in_bytes": 116969201,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_f": {
                                "generation": 15,
                                "num_docs": 237488,
                                "deleted_docs": 0,
                                "size_in_bytes": 58809773,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_g": {
                                "generation": 16,
                                "num_docs": 47355,
                                "deleted_docs": 0,
                                "size_in_bytes": 12706968,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "3": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 14,
                        "num_search_segments": 14,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 24750,
                                "deleted_docs": 0,
                                "size_in_bytes": 7997829,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 15401,
                                "deleted_docs": 0,
                                "size_in_bytes": 5526373,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 4274,
                                "deleted_docs": 0,
                                "size_in_bytes": 1670168,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 74714,
                                "deleted_docs": 0,
                                "size_in_bytes": 23282221,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 49504,
                                "deleted_docs": 0,
                                "size_in_bytes": 16640256,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 1152,
                                "deleted_docs": 0,
                                "size_in_bytes": 425884,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 723774,
                                "deleted_docs": 0,
                                "size_in_bytes": 185696573,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 374910,
                                "deleted_docs": 0,
                                "size_in_bytes": 102114378,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 144763,
                                "deleted_docs": 0,
                                "size_in_bytes": 40681188,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 72038,
                                "deleted_docs": 0,
                                "size_in_bytes": 16996726,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 60729,
                                "deleted_docs": 0,
                                "size_in_bytes": 14683267,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 434765,
                                "deleted_docs": 0,
                                "size_in_bytes": 102456131,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 87455,
                                "deleted_docs": 0,
                                "size_in_bytes": 23013397,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_d": {
                                "generation": 13,
                                "num_docs": 211170,
                                "deleted_docs": 0,
                                "size_in_bytes": 53235342,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ],
                "4": [
                    {
                        "routing": {
                            "state": "STARTED",
                            "primary": true,
                            "node": "modFRW6URWi6fqeFypw2fg"
                        },
                        "num_committed_segments": 13,
                        "num_search_segments": 13,
                        "segments": {
                            "_0": {
                                "generation": 0,
                                "num_docs": 29157,
                                "deleted_docs": 0,
                                "size_in_bytes": 9596750,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_1": {
                                "generation": 1,
                                "num_docs": 25289,
                                "deleted_docs": 0,
                                "size_in_bytes": 9046435,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_2": {
                                "generation": 2,
                                "num_docs": 101070,
                                "deleted_docs": 0,
                                "size_in_bytes": 30725411,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_3": {
                                "generation": 3,
                                "num_docs": 36789,
                                "deleted_docs": 0,
                                "size_in_bytes": 12505981,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_4": {
                                "generation": 4,
                                "num_docs": 16360,
                                "deleted_docs": 0,
                                "size_in_bytes": 5759995,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_5": {
                                "generation": 5,
                                "num_docs": 627072,
                                "deleted_docs": 0,
                                "size_in_bytes": 161563718,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_6": {
                                "generation": 6,
                                "num_docs": 447053,
                                "deleted_docs": 0,
                                "size_in_bytes": 118268120,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_7": {
                                "generation": 7,
                                "num_docs": 131989,
                                "deleted_docs": 0,
                                "size_in_bytes": 37526334,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_8": {
                                "generation": 8,
                                "num_docs": 50138,
                                "deleted_docs": 0,
                                "size_in_bytes": 12468992,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_9": {
                                "generation": 9,
                                "num_docs": 54350,
                                "deleted_docs": 0,
                                "size_in_bytes": 12764422,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_a": {
                                "generation": 10,
                                "num_docs": 245263,
                                "deleted_docs": 0,
                                "size_in_bytes": 62303574,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_b": {
                                "generation": 11,
                                "num_docs": 449658,
                                "deleted_docs": 0,
                                "size_in_bytes": 105290863,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            },
                            "_c": {
                                "generation": 12,
                                "num_docs": 66300,
                                "deleted_docs": 0,
                                "size_in_bytes": 17125599,
                                "memory_in_bytes": 0,
                                "committed": true,
                                "search": true,
                                "version": "9.8.1",
                                "compound": true,
                                "attributes": {
                                    "Lucene90StoredFieldsFormat.mode": "BEST_SPEED"
                                }
                            }
                        }
                    }
                ]
            }
        }
    }
}

Index size:

"store": {
    "size_in_bytes": 2975896540,
    "reserved_in_bytes": 0
},
rishabhmaurya commented 10 months ago

The segment sizes are too small to see any noticeable difference, I can work with you on it next week.

gashutos commented 10 months ago

@rishabhmaurya The POC you tried would only work for MatchAllQuery. I did try exactly same thing couple of months back, but matchallDocs query along with sorting (vanilla) has rare usage IMO, hence I skipped prototyping it.

backslasht commented 10 months ago

+1 on @gashutos point. @rishabhmaurya - Do you have a specific use case where this will be useful?

rishabhmaurya commented 10 months ago

@gashutos thanks for looking. Yes, I have mentioned in the poc that it is supposed to work only for MatchAllQuery with no doc deletions. This will be helpful in 2 cases -

  1. Desc numeric sort on any numeric field - This will make the iteration on bigger segments fast assuming there is no index sort on this numeric field and the lucene index size is significant (in GBs). Since such queries usually span across all segments, so theoretically it should makes things fast. I think this is a common use case and we capture this query type in most of benchmark.
  2. Desc sort on @timestamp field with merge policy as LogByteSize - After force merge, the smallest segment could be big enough to make the desc sort query slow. This will be helpful for such cases too.

Can you point me to your poc/issue and also why do you think its a rare case. Thank you

gashutos commented 10 months ago

@rishabhmaurya This problem can be divided in two parts why desc order sort is slower compare to asc order.

  1. For timeseries indices, they are in nearly sort in asc. ( which will be the case for logbytesizemerge policy as well ) RFC in Lucene -> https://github.com/apache/lucene/issues/12448

  2. For non-timeseries workload where our docIdBased disjoint iterator with bkd based competitive iterator works only in asc order of docIds. Reverse BKD based iteration -> https://github.com/opensearch-project/OpenSearch/issues/7680

The reason we think it is rare scenario because generally in production, we dont see just sort on single field without any filtering clause wrapping it. Again this is observation based on my seen user usecases.

harshavamsi commented 10 months ago

Posting some more number here, same workload and instance but this time with force merging into 1 large segment to see if it could have any impact as well as running on a single primary shard:

Non optimized cluster:

|                                        50th percentile latency |     desc_sort_population |     9.38135 |     ms |
|                                        90th percentile latency |     desc_sort_population |     10.1048 |     ms |
|                                        99th percentile latency |     desc_sort_population |     10.3617 |     ms |
|                                       100th percentile latency |     desc_sort_population |     10.7949 |     ms |
|                                   50th percentile service time |     desc_sort_population |     7.83975 |     ms |
|                                   90th percentile service time |     desc_sort_population |     8.14815 |     ms |
|                                   99th percentile service time |     desc_sort_population |     8.64486 |     ms |
|                                  100th percentile service time |     desc_sort_population |     8.80505 |     ms |
|                                                     error rate |     desc_sort_population |           0 |      % |

Optimized cluster:

|                                        50th percentile latency |     desc_sort_population |     13.4777 |     ms |
|                                        90th percentile latency |     desc_sort_population |     14.0544 |     ms |
|                                        99th percentile latency |     desc_sort_population |      14.372 |     ms |
|                                       100th percentile latency |     desc_sort_population |     15.1146 |     ms |
|                                   50th percentile service time |     desc_sort_population |     11.8186 |     ms |
|                                   90th percentile service time |     desc_sort_population |      12.006 |     ms |
|                                   99th percentile service time |     desc_sort_population |     12.4779 |     ms |
|                                  100th percentile service time |     desc_sort_population |       12.48 |     ms |
|                                                     error rate |     desc_sort_population |           0 |      % |

Will dive into lucene code path to understand where we're spending time when running this workload.

hdhalter commented 8 months ago

Hi @harshavamsi - will documentation be required for this feature in 2.12?

msfroh commented 8 months ago

will documentation be required for this feature in 2.12?

This is purely an internal optimization task. It should not require any documentation.

kiranprakash154 commented 7 months ago

Hi, are we on track for this to be released in 2.12 ?

getsaurabh02 commented 7 months ago

Pushing this out to v2.13, since this optimization is still in the investigation stage. Although the benchmarks numbers looks promising, it requires further deep dive into the lucene code path to understand where we're spending time and coming up with the improvement opportunities.

bbarani commented 6 months ago

Moved it to 2.14.0 as per the discussion with @harshavamsi

msfroh commented 6 months ago

We should try benchmarking numeric sort queries with https://github.com/apache/lucene/pull/13149.

Based on the explanation at https://blunders.io/posts/es-benchmark-4-inlining, we may see significant improvement to numeric sorting..

bbarani commented 6 months ago

Tagging @opensearch-project/benchmark-core team