opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
58 stars 59 forks source link

[FEATURE] Add z-score for the normalization processor #376

Open samuel-oci opened 9 months ago

samuel-oci commented 9 months ago

Is your feature request related to a problem?

Currently the normalization processor doesn't support z-score which is a popular technique and according in some instances produces superior results to min-max see blog

What solution would you like?

Allow to specify z-score as a normalization technique in the normalization processor

What alternatives have you considered?

Not much at the moment but suggestions are welcome.

Do you have any additional context?

see blog mentioned earlier

heemin32 commented 9 months ago

@samuel-oci Thanks for the feature request. Also welcome to take it and work on it.

samuel-oci commented 9 months ago

Sounds good, I'll take it, feel free to assign to me.

samuel-oci commented 9 months ago

Hi @heemin32 @vamshin @navneet1v I don't see any IT that tests normalization and was thinking to add those also as part of this PR. If I missed those or you already have those worked on somewhere else just let me know and we can consolidate.

navneet1v commented 9 months ago

@martin-gaievski can you please provide the reference for the integration tests that has been added.

I am able to find this: https://github.com/opensearch-project/neural-search/blob/main/src/test/java/org/opensearch/neuralsearch/query/HybridQueryIT.java#L32

samuel-oci commented 9 months ago

@navneet1v this one test hybrid but for some reason when I run it in my IDE to debug I can't see the normalization processor being triggered. Could also be something wrong with my setup.. if it is then feel free to ignore.

navneet1v commented 9 months ago

@navneet1v this one test hybrid but for some reason when I run it in my IDE to debug I can't see the normalization processor being triggered. Could also be something wrong with my setup.. if it is then feel free to ignore.

If you are using debugger it might not hit the code. I would recommend adding logs and check it. Debugger is tricky with IT tests, if you are just running ./gradlew integTest

samuel-oci commented 9 months ago

@navneet1v I added the following lines build.gradle under IntegTest task and also created a corresponding log4j2-test.xml:

    systemProperty 'log4j2.configurationFile', "${projectDir}/src/test/resources/log4j2-test.xml"

    // Set this to true this if you want to see the logs in the terminal test output.
    // note: if left false the log output will still show in your IDE
    testLogging.showStandardStreams = true

I do see logs of IT test and other core open search but don't see any logs showing from the normalization processor or NormalizationProcessorWorkflow. which is why I later run with debugger to double check. Is there a better way to add logging and making them to show in integTest task?

EDIT: regarding debugger, I see similar problem to what is described here https://forum.opensearch.org/t/debugging-test-cluster-provided-by-opensearchintegtestcase/13447/3 Let me know if there is a setup that works to attach debugger to test cluster that RestIntegTest spins?

Update: regarding debugger thanks to @martin-gaievski I now able to get it to attach. Also updated the forum with the answer to make sure in the future folks can google it ;)