[PROPOSAL] Optimize embedding processors for update scenario

What/Why

What are you proposing?

Text embedding processors can be more intelligent and skip unnecessary calls to model in update scenarios.

What users have asked for this feature?

I do have one customer case where they are doing following:

configure remote model
configure text embedding processor and attach it to ingest pipeline that is set as default at index level
ingest document, that creates embeddings
update some document fields not related to embedding field or original field that is base for embedding generation

our text embedding processor does not analyze document state and what exactly change, it just calls the model again. That doesn't add any value as it receives same embeddings and set them for new document. That model call can be avoided by simply copying embeddings from original document.

What problems are you trying to solve?

We can save on number of calls to remote model. For end customer that means:

lower bill as they typically pay by number of calls
increase stability as system can reach rate limit and other calls can be throttled by the model

What is the developer experience going to be?

We need a way to make this behavior configurable. In some cases you need to regenerate embeddings even if they previously were in the document, for example you've deployed updated version of the model. I suggest following logic:

if embeddings in current document are empty - always make a call to the model
if embeddings are not empty - check the flag. if flag says 'update' then do the call. Otherwise copy embeddings from original document.
default behavior should be - 'always update`. This will ensure backward compatibility with today's code
processor behavior of update/not update should be configurable at the processor level to allow flexibility, so different processors may be configured differently in one cluster.

Are there any security considerations?

No security concerns regarding making calls to model, this functionality exists today. New flag that regulatesd processor behavior should be added in a safely manner.

Are there any breaking changes to the API

No, as per suggested logic today's behavior will remain the same

What is the user experience going to be?

If user wants to fine tune the processor behavior they will need to set a new processor parameter.

Are there breaking changes to the User Experience?

Why should it be built? Any reason not to?

Main reason is saving cost on calls to model and lowering chances of requests being throttled. If customer having such problems today they may not even realize the system is making unnecessary calls to model.

What will it take to execute?

This will be a code change in the plugin code, for every processor that we want to onboard for this feature. change is something like:

add parameter to processor factory so new behavior can be enabled
modify processor as per suggested logic - check if in original document the field for embedding exists, if it not - call the model, today's logic. if it does - check the param, if it says not to update - return, otherwise call the model.

Any remaining open questions?

Some edge cases may be:

what if field with embeddings is marked as "excluded" in index mapping. I suggest we do the call to model Some open questions related to implementation:
if document that is passed to the processor in case of "update_by_query" has the original document. Need to check what's the input in https://github.com/opensearch-project/neural-search/blob/main/src/main/java/org/opensearch/neuralsearch/processor/InferenceProcessor.java#L120
do "update" and "update_by_query" behave the same? they probably do but that needs verification

opensearch-project / neural-search