opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.89k stars 1.63k forks source link

Fix match_phrase_prefix_query not working on text field with multiple values and index_prefixes #10959

Open gaobinlong opened 7 months ago

gaobinlong commented 7 months ago

Description

When executing match_phrase_prefix_query on a text field with multiple values and setting index_prefixes to true, the parameter position_increment_gap which defaults to 100 is not used by the sub-field {text_field}._index_prefix, this results in the spanNearQuery not working.

This change makes the sub-filed {text_field}._index_prefix using the value of position_increment_gap when indexing documents, so that a fake gap between multiple values are added. Some unit tests and yml test are added for this change.

Related Issues

https://github.com/opensearch-project/OpenSearch/issues/9203

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions[bot] commented 7 months ago

Gradle Check (Jenkins) Run Completed with:

github-actions[bot] commented 7 months ago

Compatibility status:

Checks if related components are compatible with change 349a406

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/performance-analyzer.git]

github-actions[bot] commented 7 months ago

Gradle Check (Jenkins) Run Completed with:

github-actions[bot] commented 7 months ago

Gradle Check (Jenkins) Run Completed with:

github-actions[bot] commented 7 months ago

Gradle Check (Jenkins) Run Completed with:

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 71.46%. Comparing base (b15cb0c) to head (3c5bd2f). Report is 291 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #10959 +/- ## ============================================ + Coverage 71.42% 71.46% +0.04% - Complexity 59978 60691 +713 ============================================ Files 4985 5040 +55 Lines 282275 285458 +3183 Branches 40946 41338 +392 ============================================ + Hits 201603 203997 +2394 - Misses 63999 64643 +644 - Partials 16673 16818 +145 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 7 months ago

Gradle Check (Jenkins) Run Completed with:

github-actions[bot] commented 7 months ago

Gradle Check (Jenkins) Run Completed with:

reta commented 7 months ago

@msfroh mind to back me up here as well by taking a look? thank you

github-actions[bot] commented 6 months ago

Gradle Check (Jenkins) Run Completed with:

msfroh commented 6 months ago

@msfroh mind to back me up here as well by taking a look? thank you

Eep! Sorry, I missed the notification. Taking a look now!

gaobinlong commented 6 months ago

It feels like the key fix is passing the correct analyzer to PrefixWrappedAnalyzer.

If we really need to override getPositionIncrementGap, I would suspect that we would need to fix PhraseWrappedAnalyzer too. Do you know if fast phrase fields have the same bug?

I've tested fast phrase query, it doesn't have this bug, I think the reason is that fast match_phrase_query is not like match_phrase_prefix query which will be rewritten to SpanNearQuery:

"type": "SpanNearQuery",
"description": "spanNear([applications.name:b, mask(applications.name._index_prefix:12) as applications.name], 0, true)"

, for fast match_phrase_query, only one termQuery is genereated:

"type": "TermQuery",
"description": "applications.name._index_phrase:b 12",
github-actions[bot] commented 6 months ago

:white_check_mark: Gradle check result for 4d4484d52813cb568700a1c2a35f8b9e243e89f5: SUCCESS

gaobinlong commented 6 months ago

@msfroh, there are some update in this PR, could you please help to take a look?

opensearch-trigger-bot[bot] commented 5 months ago

This PR is stalled because it has been open for 30 days with no activity.

github-actions[bot] commented 4 months ago

:x: Gradle check result for 763099aca8aefa839d7e9b07da2cfc9da134ee2c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 4 months ago

:x: Gradle check result for 8874093313031ddab3bca0a5b21ec87c8bfda11c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 4 months ago

:x: Gradle check result for 251bd44eee068d66f400ac6362ffa836fbff563b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 4 months ago

:grey_exclamation: Gradle check result for 91d61f6635457269750cca71269f973bb843c6b6: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

gaobinlong commented 4 months ago

Hi @msfroh , sorry to bother you, do you have any concern about this PR?

opensearch-trigger-bot[bot] commented 3 months ago

This PR is stalled because it has been open for 30 days with no activity.

opensearch-trigger-bot[bot] commented 2 months ago

This PR is stalled because it has been open for 30 days with no activity.

gaobinlong commented 1 month ago

Could any maintainers help to review and approve this PR? It has been stalled for a long time and all comments from the previous reviewer have been addressed yet.

gaobinlong commented 1 month ago

Can you explain what setAnalyzer does and why it moved? Double-checking that there aren't public API breaking changes here, too.

Thanks for your review, setAnalyzer() here is used to set a indexing analyzer for the prefix field {text_field}._index_prefix, which will add a fake gap between multiple values, but the gap position_increment_gap only exists in the parent field's indexAnalyzer, so we need to get the gap by parent.indexAnalyzer().name(), because we've passed the parent field to the construction method of PrefixFieldType: https://github.com/opensearch-project/OpenSearch/blob/6bc04b49a0031e37466001469abcfd593e454813/server/src/main/java/org/opensearch/index/mapper/TextFieldMapper.java#L597 , so I rewrite and move the setAnalyzer() to make the code more clean.

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for 349a4069d7400d7347c17b0f04e684a5225942dd: SUCCESS

github-actions[bot] commented 1 month ago

:x: Gradle check result for f510415a8b37d7d9d3950a946647f22a173d41d0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for 3c5bd2f723f3ecff4fab4efdc04b49f82e5ca6a2: SUCCESS

opensearch-trigger-bot[bot] commented 2 weeks ago

This PR is stalled because it has been open for 30 days with no activity.