opensearch-project / opensearch-java

Java Client for OpenSearch
Apache License 2.0
118 stars 182 forks source link

[BUG] Deserializing MatchQuery ZeroTermsQuery field fails if the source JSON comes from OpenSearch MatchQuery #1150

Open dbwiddis opened 1 month ago

dbwiddis commented 1 month ago

What is the bug?

The enum defined for the client uses lower case values for the ZeroTermsQuery enum: https://github.com/opensearch-project/opensearch-java/blob/08e7e6504d4e6029640940f4bb4670e5e183c700/java-client/src/main/java/org/opensearch/client/opensearch/_types/query_dsl/ZeroTermsQuery.java#L38-L42

However, this enum is defined on OpenSearch with traditional all-caps enum names:

public enum ZeroTermsQuery implements Writeable {
    NONE(0),
    ALL(1),

As a result, a search query generated on OpenSearch can not simply transform its JSON into a client search request.

How can one reproduce the bug?

  1. Generate a Search Query on OpenSearch, for example:
    BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
    boolQueryBuilder.must(QueryBuilders.matchQuery(CONNECTOR_ID_FIELD, connectorId));
  2. Add that query into a SearchSourceBuilder:
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.query(boolQueryBuilder);
  3. Transform that SearchSourceBuilder to JSON:
    String json = sourceBuilder.toString();

    Note the value of zero_terms_query is all upper case.

    {
    "query": {
    "bool": {
      "must": [
        {
          "match": {
            "connector_id": {
              "query": "Jm_4dpEBnn49655wiz2Y",
              "operator": "OR",
              "prefix_length": 0,
              "max_expansions": 50,
              "fuzzy_transpositions": true,
              "lenient": false,
              "zero_terms_query": "NONE",
              "auto_generate_synonyms_phrase_query": true,
              "boost": 1
            }
          }
        },
        {
          "ids": {
            "values": [],
            "boost": 1
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
    }
    }
  4. Create a parser with that JSON:
    JsonpMapper mapper = openSearchClient._transport().jsonpMapper();
    JsonParser parser = mapper.jsonProvider().createParser(new StringReader(json));
  5. Attempt to deserialize that JSON into a OpenSearch Java Client SearchRequest object:
    SearchRequest searchRequest = SearchRequest._DESERIALIZER.deserialize(parser, mapper);
  6. Observe exception:
    2024-08-21 15:04:46 jakarta.json.stream.JsonParsingException: Invalid enum 'NONE'
    2024-08-21 15:04:46     at org.opensearch.client.json.JsonEnum$Deserializer.deserialize(JsonEnum.java:116) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.JsonEnum$Deserializer.deserialize(JsonEnum.java:102) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.JsonEnum$Deserializer.deserialize(JsonEnum.java:61) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.JsonpDeserializer.deserialize(JsonpDeserializer.java:87) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.ObjectDeserializer$FieldObjectDeserializer.deserialize(ObjectDeserializer.java:81) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.ObjectDeserializer.deserialize(ObjectDeserializer.java:185) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.ObjectDeserializer.deserialize(ObjectDeserializer.java:146) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.JsonpDeserializer.deserialize(JsonpDeserializer.java:87) ~[?:?]
    2024-08-21 15:04:46     at org.opensearch.client.json.ObjectBuilderDeserializer.deserialize(ObjectBuilderDeserializer.java:91) ~[?:?]
    <snip>

What is the expected behavior?

In general, an OpenSearch SearchSourceBuilder can be serialized into JSON and then deserialized into an OpenSearch Java Client SearchRequest.

In this particular case, the all-caps enum name should match case-insensitively rather than throwing an exception.

See, for example, how logical operators (such as the OR in this query) accept either all-upper or all-lower case: https://github.com/opensearch-project/opensearch-java/blob/08e7e6504d4e6029640940f4bb4670e5e183c700/java-client/src/main/java/org/opensearch/client/opensearch/_types/query_dsl/Operator.java#L39-L42

What is your host/environment?

Running this on OpenSearch 2.15 code, but it has not changed since pre-fork.

Do you have any additional context?

Relevant code block causing the issue on a feature branch: https://github.com/opensearch-project/ml-commons/blob/feature/multi_tenancy/plugin/src/main/java/org/opensearch/ml/sdkclient/RemoteClusterIndicesClient.java#L230-L232

Xtansia commented 3 weeks ago

As you've noted for cases like Operator the JsonEnum type does have an affordance for aliases for given enum values. So this should be an easy fix of just amending ZeroTermsQuery to add the all-caps variants as aliases.

This is something we should consider how to represent in the spec: https://github.com/opensearch-project/opensearch-api-specification/blob/19421f502740967e4d6df102f1c5765c53eaa010/spec/schemas/_common.query_dsl.yaml#L943

dbwiddis commented 3 weeks ago

This is something we should consider how to represent in the spec: https://github.com/opensearch-project/opensearch-api-specification/blob/19421f502740967e4d6df102f1c5765c53eaa010/spec/schemas/_common.query_dsl.yaml#L943

This SO answer says regex.

I love/hate regex. :)

dbwiddis commented 3 weeks ago

I agree with you though, this should likely be handled in the spec somehow but not sure there's a standard for that and not sure we want to hand-edit exceptions. It's easy enough to work around if documented, but let's at least discuss options.

dbwiddis commented 3 weeks ago

It looks like Jackson JsonP does support this with appropriate annotations. https://stackoverflow.com/questions/26058854/case-insensitive-json-to-pojo-mapping-without-changing-the-pojo