opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
57 stars 58 forks source link

fix map type validation issue in processors #687

Closed zane-neo closed 1 month ago

zane-neo commented 3 months ago

Description

When user uses map type configuration in processors, the validation will validate all other fields instead of only the configured fields, sometimes if other fields has unsupported types, user will get exception which is not expected.

Multiple processors uses similar validation logic but code are duplicated, in this PR a new Util is been added to extract the duplicated code to a common place and reused by different processors.

Issues Resolved

https://github.com/opensearch-project/ml-commons/issues/2309

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

yuye-aws commented 2 months ago

Happy to see that this PR extracts all validation function into file src/main/java/org/opensearch/neuralsearch/util/ProcessorDocumentUtils.java? Currently, all the functions except getMaxDepth is about validation, can we simply rename this file to DocumentValidator?

yuye-aws commented 2 months ago

We still have parsing methods like buildMapWithTargetKeyAndOriginalValue and createInferenceList. Can we also extract them into another class named "DocumentParser"?

zane-neo commented 2 months ago

Happy to see that this PR extracts all validation function into file src/main/java/org/opensearch/neuralsearch/util/ProcessorDocumentUtils.java? Currently, all the functions except getMaxDepth is about validation, can we simply rename this file to DocumentValidator?

No, this file will be adding more and more common methods like parsing data structure etc, so will name as what it is now.

zane-neo commented 2 months ago

We still have parsing methods like buildMapWithTargetKeyAndOriginalValue and createInferenceList. Can we also extract them into another class named "DocumentParser"?

We will supports more complex cases in the future and do this extraction that time, we'll put these common code in the new class added this time instead of creating another class.

yuye-aws commented 2 months ago

We still have parsing methods like buildMapWithTargetKeyAndOriginalValue and createInferenceList. Can we also extract them into another class named "DocumentParser"?

We will supports more complex cases in the future and do this extraction that time, we'll put these common code in the new class added this time instead of creating another class.

Makes sense.

zane-neo commented 2 months ago

The BWC test failure is not caused by this change and there's already an issue tracking it: https://github.com/opensearch-project/neural-search/issues/688

vibrantvarun commented 2 months ago

Why BWC tests are failing in the build with the same issue?

yuye-aws commented 2 months ago

This PR looks good to me

vibrantvarun commented 2 months ago

Why are the BWC tests failing @yuye-aws ?

yuye-aws commented 2 months ago

Hi @vibrantvarun ! The BWC tests now get passed. What's our next step plan to merge this PR?