opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.65k stars 1.78k forks source link

[BUG] Arrays of `geo_shape` are broken after v2.8.0 #14193

Open kurtmckee opened 4 months ago

kurtmckee commented 4 months ago

Describe the bug

While investigating whether a product I work on could migrate from ElasticSearch to OpenSearch, I found that OpenSearch support for ingesting arrays of geo_shape objects is broken after v2.8.0.

Specifically, the error message I get back during ingest is:

{
  "error" : {
    "type" : "illegal_argument_exception",
    "reason" : "DocValuesField \"location\" appears more than once in this document (only one value is allowed per field)"
  }
}

I created a YAML test and used git bisect between v2.8.0 and v2.9.0 commits to track the issue to commit ebe6f2c2878f036cb4c1147683c30091747d7632, introduced in PRs #4266 and #8301.

Related component

Other

To Reproduce

[!NOTE]

I've opened PR #14196 using a branch containing the YAML test file that demonstrates this issue. An OpenSearch maintainer can pull the changes, add fix commits, and push back to the associated PR.

I'm running the test suite using the following command with an additional YAML file that demonstrates the problem.

./gradlew :modules:geo:yamlRestTest

YAML file location:

modules/geo/src/yamlRestTest/resources/rest-api-spec/test/geo_shape/300_array_of_geo_shapes.yml

YAML file contents (with comments added only here):

Details

```yaml setup: - do: indices.create: index: test body: settings: number_of_replicas: 0 mappings: properties: location: type: "geo_shape" --- # This works across all git commits in 2.8.0..2.9.0 "Single shape": - do: bulk: refresh: true body: - index: _index: test _id: 1 - location: type: "polygon" coordinates: [ [ [ -87.0, 41.0 ], [ -88.0, 41.0 ], [ -88.0, 42.0 ], [ -87.0, 42.0 ], [ -87.0, 41.0 ], ] ] - match: {errors: false} --- # This fails starting at commit ebe6f2c2878f036cb4c1147683c30091747d7632 "Array of shapes": - do: bulk: refresh: true body: - index: _index: test _id: 1 - location: - type: "polygon" coordinates: [ [ [ -87.0, 41.0 ], [ -88.0, 41.0 ], [ -88.0, 42.0 ], [ -87.0, 42.0 ], [ -87.0, 41.0 ], ] ] - type: "polygon" coordinates: [ [ [ -97.0, 41.0 ], [ -98.0, 41.0 ], [ -98.0, 42.0 ], [ -97.0, 42.0 ], [ -97.0, 41.0 ], ] ] - match: {errors: false} ```

Expected behavior

I expected that I could creating a "location" field with a geo_shape mapping and then ingest an array of geo_shape objects. The product I work on uses ElasticSearch 7.10 and arrays of geo_shape objects can be ingested, and this also is how arrays are documented to work:

There is no dedicated array field type in OpenSearch. Instead, you can pass an array of values into any field.

Additional Details

None. I'm just running the test suite with an additional YAML file.

dblock commented 4 months ago

Thank you for a very detailed report, a YAML REST test, and a bisect, this is super helpful @kurtmckee. Are you trying to fix it? /cc: @heemin32 @navneet1v

kurtmckee commented 4 months ago

@dblock You're welcome! Unfortunately I won't be able to dive in to fix this (I lack experience with Java). I'm grateful you double-checked, though, to avoid possible duplication of effort!

andrross commented 4 months ago

[Triage - attendees 1 2 3 4]

Thanks for filing. Looks like a clear bug. @kurtmckee thanks for the investigation and detail provided!

heemin32 commented 4 months ago

@navneet1v Could you take a look?

navneet1v commented 4 months ago

@kurtmckee on which version of Opensearch you are facing the issue? Can you switch to 2.11/2.13 version of Opensearch?

kurtmckee commented 4 months ago

@navneet1v

which version of Opensearch you are facing the issue

This bug is present in all versions of OpenSearch after v2.8.0. I opened a PR, #14196, based off main, that demonstrates this issue.

The Jenkins run for that PR shows that STDOUT contains the error I described in this bug report.

I backtracked through the Docker images provided by OpenSearch in our test suite until I found that 2.8.0 worked; I then cloned the OpenSearch repo, created the YAML test case, and used git bisect to find the commit that introduced the regression.

navneet1v commented 4 months ago

@navneet1v

which version of Opensearch you are facing the issue

This bug is present in all versions of OpenSearch after v2.8.0. I opened a PR, #14196, based off main, that demonstrates this issue.

The Jenkins run for that PR shows that STDOUT contains the error I described in this bug report.

I backtracked through the Docker images provided by OpenSearch in our test suite until I found that 2.8.0 worked; I then cloned the OpenSearch repo, created the YAML test case, and used git bisect to find the commit that introduced the regression.

Can you try one thing, for the geo_shapes field can you try disabling the doc_value using "doc_values": false in mappings.

I will go ahead and look into what part of code is creating the issue, because I do know that we can index more than 1 geo_shape in 1 field.

kurtmckee commented 4 months ago

@navneet1v Thanks for that suggestion. It appears that if doc_values: false is set in the mapping then arrays of geo_shape can be ingested.

I've added an additional commit that also tests doc_values: false to PR #14196 to both demonstrate that this avoids the problem, and to ensure that the test suite guards against future regressions. You should have permissions to push commits to that PR branch.

# Using the GitHub CLI
gh co 14196

# Using git
git remote add kurtmckee git@github.com:kurtmckee/pr-OpenSearch.git
git fetch kurtmckee
git checkout -b fix-arrays-of-geo_shape-issue-14193 kurtmckee/fix-arrays-of-geo_shape-issue-14193