opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.83k stars 1.83k forks source link

Sort on multi-value unsigned long field is incorrect #16698

Open bugmakerrrrrr opened 7 hours ago

bugmakerrrrrr commented 7 hours ago

Describe the bug

Recently, I worked on supporting the reuse of doc values to reconstruct the source field and avoid storing it. Then, I found that the number fields are ordered when fetching from sorted numeric doc values except the unsigned long field. This is because Lucene treats it as a signed number. IMO, this is not a problem in many situations, but if an unsigned long field has multiple values per doc, the sort result may be wrong, because in the current implementation, the sort mode uses signed long to pick the value.

Related component

Search

To Reproduce

  1. create index
    {
    "mappings": {
    "properties": {
      "field1": {
        "type": "unsigned_long"
      }
    }
    }
    }
  2. index some docs
    {"index": {"_id": 1}}
    {"field1": [13835058055282163712, 3]}
    {"index": {"_id": 2}}
    {"field1": [13835058055282163713, 2]}
    {"index": {"_id": 3}}
    {"field1": [13835058055282163714, 1]}
  3. search with sort
    {
    "query": {
    "match_all": {}
    },
    "sort": [
    {
      "field1": {
        "order": "desc",
        "mode": "max"
      }
    }
    ]
    }

Expected behavior

The sort result should be correct.

Possible solution: create a dedicated UnsignedLongMultiValueMode for unsigned long, then convert the normal mode to unsigned long mode and use the converted mode to pick the sort value in UnsignedLongValuesComparatorSource.

Additional Details

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

bugmakerrrrrr commented 7 hours ago

@reta You may be interested in this. Any thoughts?

bugmakerrrrrr commented 7 hours ago

And if we want to make values ordered in one doc, maybe we can flip the sign bit before sending the value to Lucene, and re-flip the sign bit when reading from sorted numeric doc value, this can benefit max & min multi-value mode. But this is a breaking change, we can do it according to the index version.

reta commented 5 hours ago

Thanks @bugmakerrrrrr , I should be able to get into it next week, unless you want to pick it up? :-)