opensearch-project / opensearch-java

Java Client for OpenSearch
Apache License 2.0
116 stars 181 forks source link

[BUG] Sort values in the Hit type are wrongly always deserialized to string #1128

Open djivko opened 1 month ago

djivko commented 1 month ago

What is the bug?

Executing a search request that has sorting in it will produce SearchHits results where each Hit will have it's sort array field populated. However the sort array is always deserialized as string array and if one wants to use this value to repeat the request with search_after being set to the returned value of the sort this could result in an error from the server if the field type is not actually string.

How can one reproduce the bug?

Create an index which has the following mapping

"creationDate": {
    "type": "date",
    "format": "date_time"
}

Java field definition

@Field(name = "creationDate", type = FieldType.Date, format = DateFormat.date_time)
private Date creationDate;

// Setters/Getters

Add a few documents that get indexed and then execute a search request followed by another search that utilizes search_after. For example

NativeQueryBuilder queryBuilder = NativeQuery.builder();
queryBuilder.withQuery(q -> q.matchAll(m -> m));
queryBuilder.withSort(List.of(
    (new SortOptions.Builder()).field(fb -> fb.field("creationDate").order(SortOrder.Asc))
        .build()));
queryBuilder.withPageable(PageRequest.of(0, 1));

List<Object> searchAfter = null;
boolean done = false;
while (!done) {
  queryBuilder.withSearchAfter(searchAfter);

  SearchHits<T> hits = operations.search(queryBuilder.build(), entityClass, getIndexCoordinates());
  List<SearchHit<T>> searchHits = hits.getSearchHits();

  boolean hasResults = searchHits != null && !searchHits.isEmpty();
  if (hasResults) {
    // This will always return List of strings
    searchAfter = searchHits.getLast().getSortValues();
  }

  if (!hasResults || searchAfter == null || searchAfter.isEmpty() || searchAfter.stream()
      .filter(Objects::nonNull)
      .toList()
      .isEmpty() || (searchHits.size() < 1)) {
    done = true;
  }
}

When executing the above code the server will return something like

{
    // metadata
    "hits": {
        // metadata
        "hits": [
            {
                // _source, _index, _id etc
                "sort": [
                    1720256355885
                ]
            }
        ]
    }
}

However during deserialization the client will deserialize the sort field to

"sort": [
    "1720256355885"
]

Thus repeating the request with search after will produce an error. Something like

"root_cause": [
            {
                "type": "parse_exception",
                "reason": "failed to parse date field [1720256355885] with format [date_time]: [failed to parse date field [1720256355885] with format [date_time]]"
            }
        ]

What is the expected behavior?

The returned value types of the sort field must be maintained as returned from the server.

What is your host/environment?

SpringBoot application

Do you have any additional context?

Xtansia commented 1 month ago

Thanks for the report @djivko! This is related to #755, and will require similar fixes:

Is this something you'd be interested in possibly contributing yourself?

djivko commented 1 month ago

@Xtansia sure I can give it a shot. It might take a bit more time though as I will be doing it im my spare time. Thanks for all the pointers!