opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
61 stars 65 forks source link

[FEATURE] Add rescoreContext for neuralsparse query's two-phase speed up. #695

Closed conggguan closed 4 months ago

conggguan commented 5 months ago

Description

This change implement for #646

Feature support query

Issues Resolved

Resolve #646

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

navneet1v commented 5 months ago

Intro

Now implement two-phase feature by adding rescoreContext in the begin of HybridQueryPhaseSearcher's. It is harmless but little hack, so there is a long term plan to replace this implementation.

RFC in opensearch core

What we need to implement this feature is the whole query and searchContext, so if OpenSearch Core can open a API which can help me get it before the query performed, we can move the rescoreContext function to a more general interface of core.

RoadMap of this optimizing

  1. RFC in OpenSearch Core.
  2. A Incremental update in NeuralSearch plugin.

Please create a RFC on the opensearch core side and provide details why we cannot open up an interface from core to implement this functionality?

navneet1v commented 5 months ago

I was not able to complete the full review of the PR as the PR contains more than 1 feature. The description of the PR says it has these 2 feature:

Enhance the speed of neuralsparse query by two-phase. Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

But in the PR we have code add the feature to use tokens in the NeuralSparseQuery clause.

zhichao-aws commented 5 months ago

I was not able to complete the full review of the PR as the PR contains more than 1 feature. The description of the PR says it has these 2 feature:

Enhance the speed of neuralsparse query by two-phase. Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

But in the PR we have code add the feature to use tokens in the NeuralSparseQuery clause.

Yes we merge #693 manually in advance because the query by tokens help us to write IT cases more flexible. Will update this PR after #693 get merged (will be merged soon)

zhichao-aws commented 5 months ago

I was not able to complete the full review of the PR as the PR contains more than 1 feature. The description of the PR says it has these 2 feature:

Enhance the speed of neuralsparse query by two-phase. Now support top-level, boolean and boost compound query; for other compound query will degrade into origin logic and speed.

But in the PR we have code add the feature to use tokens in the NeuralSparseQuery clause.

Updated now.

conggguan commented 5 months ago

@vibrantvarun Some BWC test failed because of there is a version check of stream.

// stream in 
if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) {
    this.neuralSparseTwoPhaseParameters = in.readOptionalWriteable(NeuralSparseTwoPhaseParameters::new);
}
// stream out
if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) {
    out.writeOptionalWriteable(this.neuralSparseTwoPhaseParameters);
}

But at now the PR hasn't been backport to V_2_14_0, so after merge will success. Or we can set Version.CURRENT to MinReqVersion and make a further PR to change it to 2.14 again. If a extra pr required or just keep is okay?

zhichao-aws commented 5 months ago

Lets ensure that all BWC tests are successful before this change can be merged.

Some BWC test failed because of there is a version check of stream.

// stream in if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) { this.neuralSparseTwoPhaseParameters = in.readOptionalWriteable(NeuralSparseTwoPhaseParameters::new); } // stream out if (NeuralSparseTwoPhaseParameters.isClusterOnOrAfterMinReqVersionForTwoPhaseSearchSupport()) { out.writeOptionalWriteable(this.neuralSparseTwoPhaseParameters); } But at now the PR hasn't been backport to V_2_14_0, so the 2.14 node can not deserialize the stream correctly. It can only work after backport success. One workaround is using Version.Current, but we need to modify it to 2.14 after backport finished.

vibrantvarun commented 5 months ago

@conggguan please add BWC test for the change. It gives more confidence. Also I see you have significant changes in constructors that too version specific. BWC test is a must for these type of changes.

cc: @navneet1v