opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.43k stars 1.72k forks source link

[BUG] Incomplete results with search_after and multiple shards #14824

Open TatianaNeuer opened 1 month ago

TatianaNeuer commented 1 month ago

Describe the bug

In some cases, a search request with "search_after" and "track_total_hits=false" does not return all expected documents, some documents are missing.

Related component

Search

To Reproduce

  1. Create an index with 10 shards (a different number of shards might not trigger the bug): PUT /test_index
    {
    "settings": {
        "index": {
            "number_of_shards": 10
        }
    }
    }
  2. Index some documents (putting different _id might not trigger the bug): POST /_bulk
    
    { "index": { "_index": "test_index", "_id": "test_index-id-doc1" } }
    {"docNb": "doc1","name": "bob"}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc2" } }
    {"docNb": "doc2","name": ""}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc3" } }
    {"docNb": "doc3"}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc4" } }
    {"docNb": "doc4","name": "ana"}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc5" } }
    {"docNb": "doc5","name": ""}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc6" } }
    {"docNb": "doc6"}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc7" } }
    {"docNb": "doc7"}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc8" } }
    {"docNb": "doc8"}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc9" } }
    {"docNb": "doc9","name": ""}
    { "index": { "_index": "test_index", "_id": "test_index-id-doc10" } }
    {"docNb": "doc10","name": ""}
3. Search documents:
GET /test_index/_search

{ "size": 20, "track_total_hits": false, "sort": [ { "name.keyword": { "order": "desc" } }, { "docNb.keyword": { "order": "asc" } } ], "search_after": [ "ana", "doc4" ] }

4. The result contains 7 documents instead of 8. Execute the previous request with "track_total_hits": true and the results contains the correct number of documents.

### Expected behavior

A request with "search_after" and "track_total_hits:false" should return the correct number of documents. 

### Additional Details

**Host/Environment (please complete the following information):**
 - OS: Windows 10 with WSL2 and docker
 - Version : docker image: opensearchproject/opensearch:2.15.0
- 1 opensearch node run with the following docker compose file: 

version: '3' services: opensearch: image: opensearchproject/opensearch:2.15.0 container_name: opensearch environment:

volumes: opensearch:

networks: opensearch-net:

bugmakerrrrrr commented 1 month ago

The doc test_index-id-doc1 and test_index-id-doc6 is on the same shard, so the MinAndMax value of field name on this shard is bob(doc6 has no name field). If we set track_total_hits=false, the search_after param will be took into consideration during can match phase. Because the MinAndMax of the shard is larger than search_after value ana, this shard will be filtered due to cannot match. I think we should take missing value into consideration during can match phase. I can help to fix this issue.