opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
96 stars 135 forks source link

[FEATURE] Conversational search (RAG): Allow missing fields in context documents #2277

Open reuschling opened 7 months ago

reuschling commented 7 months ago

In my index, most documents have the field 'body', and sometimes also 'title' and 'description'. Because the data is crawled, we can not make sure that there is valid data for each document. Nevertheless it would be nice if e.g. 'description' will be considered for generating the answer if there is one.

Currently, the existence of a field specified in the "context_field_list" of the rag pipeline is mandatory. I get the Error: [ERROR][o.o.s.q.g.GenerativeQAResponseProcessor] [port-4106] Context description not found in search hit { "_index" : "exampleIndex", "_id" : "docId_0", "_score" : 0.7, "_source" : { "body" : " ....someText" ....

I know I could add empty fields to my documents, but one of the key concepts in OpenSearch/Lucene is that not all documents must follow the same 'data schema'. This is also valid for the search, where only documents with matching fields will be returned.

So, in terms of consistency and robustness please allow fields inside "context_field_list" that don't have to appear in all result documents.

austintlee commented 1 month ago

@reuschling I will introduce ignoreMissing and throw an error only if ignoreMissng is false and none of the fields in the context_field_list is present. How does that sound?

reuschling commented 1 month ago

This sounds like a great solution, thanks a lot.