opensearch-project / opensearch-java

Java Client for OpenSearch
Apache License 2.0
118 stars 182 forks source link

[BUG] OpenSearchException: Request failed: [mapper_parsing_exception] failed to parse #362

Open yash025 opened 1 year ago

yash025 commented 1 year ago

What is the bug?

When we push the document as a JSON string using IndexRequest, the API fails with the below error the same works if we pass the document as a java map. Caused by: org.opensearch.client.opensearch._types.OpenSearchException: Request failed: [mapper_parsing_exception] failed to parse at org.opensearch.client.transport.aws.AwsSdk2Transport.parseResponse(AwsSdk2Transport.java:530) at org.opensearch.client.transport.aws.AwsSdk2Transport.executeSync(AwsSdk2Transport.java:438) at org.opensearch.client.transport.aws.AwsSdk2Transport.performRequest(AwsSdk2Transport.java:241) at org.opensearch.client.opensearch.OpenSearchClient.index(OpenSearchClient.java:764)

Are there any working examples where JSON string is pushed as a document instead of java POJOs?

I'm trying this in Scala.

 val openSearchClient: OpenSearchClient = new OpenSearchClient(
    new AwsSdk2Transport(httpClient,
                         host,
                         "aoss",
                         region,
                         AwsSdk2TransportOptions.builder().setCredentials(credentials).build()))
  val documentJsonStr                    = "{'name': 'yashwanth', 'age': '23'}"
  val documentJson                       = JsonData.of[String](documentJsonStr)
  val request                            = IndexRequest.of[JsonData](f => f.index(index).document(documentJson))
  openSearchClient.index(request)

Below code works, where the document is passed as java map to the same index

  val openSearchClient: OpenSearchClient = new OpenSearchClient(
    new AwsSdk2Transport(httpClient,
      host,
      "aoss",
      region,
      AwsSdk2TransportOptions.builder().setCredentials(credentials).build()))
  val temp2 = new util.HashMap[String, String]()
  temp2.put("name", "yashwanth")
  temp2.put("age", "23")

  val documentIndexRequest =
    new IndexRequest.Builder().id("1").index(index).document(temp2).build

  openSearchClient.index(documentIndexRequest)
dblock commented 1 year ago

This is similar to https://github.com/opensearch-project/opensearch-java/issues/297 and there's some hints on how to do that in https://github.com/opensearch-project/opensearch-java/issues/297#issuecomment-1362157933. I don't have working code to share though, let's try and work through it? Maybe @owaiskazi19 or @Xtansia have an example?

Elasticsearch-java has since added a withJson method, so I think we ultimately do want to write new IndexRequest.Builder().id("1").index(index).withJson(json).build.

Finally, maybe make a sample ala https://github.com/dblock/opensearch-java-client-demo in Scala, so we have something to start with?

yash025 commented 1 year ago

Thanks @dblock for the quick response, I tried based on the hints, and I could add a document and query it.

Here's a demo project in Scala: https://github.com/yash025/opensearch-scala-demo

dblock commented 1 year ago

Awesome, thanks @yash025. So you have a document type and you serialize it to JSON with CirceToJava? Originally you wanted to make raw JSON work, I think we'd still be interested in that. Same for Java.

Xtansia commented 1 year ago

Based on this test case https://github.com/opensearch-project/opensearch-java/blob/main/java-client/src/test/java/org/opensearch/client/opensearch/json/JsonDataTest.java#L51-L63 I think the correct way to get from a JSON string to a JsonData (in Java) is something like:

JsonpMapper mapper = openSearchClient._transport().jsonpMapper();
JsonParser parser = mapper.jsonProvider().createParser(new StringReader(jsonString));
JsonData data = JsonData.from(parser, mapper);
dblock commented 1 year ago

I think this is a feature request to add json (or withJson) everywhere we support document. Anyone wants to give it a try?

yash025 commented 1 year ago

@dblock I will try to squeeze in some time this week and try this.

yash025 commented 1 year ago

@dblock Question, What should be the TDocument for withJson? Should we create some class similar to CirceToJava something like RawJson and whenever someone wants to use raw json as the document, then that will be TDocument I mean it'll be the type of IndexRequest(IndexRequest[RawJson])? I don't see much use of withJson in the java world, and I did check withJson in the latest version of elastic search java client it won't work for complex json they've written some simple JSON mapper which will try to deserialize that back to TDocument ignoring unknown fields, for complex json(multi nested) user need to specify the parser and mapper explicitly.

dblock commented 1 year ago

@yash025 I am not sure, but I'm thinking really from the POV of a developer who has a bunch of documents/queries and just wants to make them, without stuffing the JSON into well defined structures. This is particularly useful in IndexRequest.Builder().id("1").index(index).withJson(json).build() because the document being indexed can really be any JSON, and similarly would be useful in search, but I agree that it's probably not more useful than that. In such I think your suggestion works!

owaiskazi19 commented 1 year ago

Hey @yash025! You can refer https://github.com/opensearch-project/opensearch-java/issues/257 which has the sample code in Java to create an index. Currently, we don't have withJson support but you can pass the mapping file for the index similar to:


private String getAnomalyDetectorMappings() throws IOException {
        URL url = AnomalyDetectionIndices.class.getClassLoader().getResource(ANOMALY_DETECTORS_INDEX_MAPPING_FILE);
        return Resources.toString(url, Charsets.UTF_8);
    }
yash025 commented 1 year ago

Hi, @owaiskazi19 thanks, I've found a workaround mentioned in the above comment, that works for me.

yash025 commented 1 year ago

@yash025 I am not sure, but I'm thinking really from the POV of a developer who has a bunch of documents/queries and just wants to make them, without stuffing the JSON into well defined structures. This is particularly useful in IndexRequest.Builder().id("1").index(index).withJson(json).build() because the document being indexed can really be any JSON, and similarly would be useful in search, but I agree that it's probably not more useful than that. In such I think your suggestion works!

@dblock so should I go and add a class similar to CirceToJava something like RawJSON, and whoever wants to use raw json they should use that class for eg: IndexRequest.Builder[RawJSON]().id("1").index(index).document(new RawJSON().withJsonStr(<jsonString>)).build()

dblock commented 1 year ago

I think document(new RawJSON().withJsonStr(<jsonString>)) is really ugly and should be wrapped as jsonDocument() or .withJson instead of .document. WDYT?

yash025 commented 1 year ago

Yes, that would look nicer, but we need to provide .withJson() or jsonDocument() only when the IndexRequest is of type RawJSON, right? IndexRequest[<any other class object>].index(index).jsonDocument() , how to handle this?

dblock commented 1 year ago

🤔 @yash025 I am not sure. Give it a try? Let's look at code?