opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
56 stars 58 forks source link

[PROPOSAL] Optimize embedding processors for update scenario #793

Open martin-gaievski opened 2 weeks ago

martin-gaievski commented 2 weeks ago

What/Why

What are you proposing?

Text embedding processors can be more intelligent and skip unnecessary calls to model in update scenarios.

What users have asked for this feature?

I do have one customer case where they are doing following:

our text embedding processor does not analyze document state and what exactly change, it just calls the model again. That doesn't add any value as it receives same embeddings and set them for new document. That model call can be avoided by simply copying embeddings from original document.

What problems are you trying to solve?

We can save on number of calls to remote model. For end customer that means:

What is the developer experience going to be?

We need a way to make this behavior configurable. In some cases you need to regenerate embeddings even if they previously were in the document, for example you've deployed updated version of the model. I suggest following logic:

Are there any security considerations?

No security concerns regarding making calls to model, this functionality exists today. New flag that regulatesd processor behavior should be added in a safely manner.

Are there any breaking changes to the API

No, as per suggested logic today's behavior will remain the same

What is the user experience going to be?

If user wants to fine tune the processor behavior they will need to set a new processor parameter.

Are there breaking changes to the User Experience?

No

Why should it be built? Any reason not to?

Main reason is saving cost on calls to model and lowering chances of requests being throttled. If customer having such problems today they may not even realize the system is making unnecessary calls to model.

What will it take to execute?

This will be a code change in the plugin code, for every processor that we want to onboard for this feature. change is something like:

Any remaining open questions?

Some edge cases may be:

dblock commented 1 day ago

[Catch All Triage - Attendees 1, 2, 3, 4, 5]