Open yuye-aws opened 2 weeks ago
Neural search with explain
is not working. I could not find a workaround.
@yuye-aws Inner hits are not supported in hybrid query. There is a feature request for this (https://github.com/opensearch-project/neural-search/issues/718), but at the moment there is no path forward
I'm not using hybrid query, just a plain neural query.
Are both features not supported due to the same blocking issue?
Sorry, my bad. Neural query is different, I'm not sure why nested doesn't work, in the code of neural we delegate execution to knn query, so you may want to check how it's done in knn. Easy test would be to try if plain knn query supports "nested" clause
Easy test would - try if plain knn query supports "nested" clause
Already tried in my fifth step.
Easy test would - try if plain knn query supports "nested" clause
Already tried in my fifth step.
In step 5 you do have neural query. I mean the knn query, something like in following example but with nested:
"query": {
"knn": {
"embedding_field": {
"vector": [
5.0,
4.0,
....
3.8
],
"k": 12
}
}
}
@yuye-aws I found this change in knn https://github.com/opensearch-project/k-NN/pull/1182, the essense of it is: in case of nested documents we need to return only one that gave the max score, and drop others. It became new default behavior instead of old one where all nested docs (meaning inner hits) are returned. From knn it's inherited by neural query.
in case of nested documents we need to return only one that gave the max score, and drop others. It became new default behavior instead of old one where all nested docs (meaning inner hits) are returned.
This does not make sense, because the score_mode can also be avg, where we expect to see all the scores.
From knn it's inherited by neural query.
Shall we make a PR to knn repo? After all, nested k-NN query also needs avg score mode.
@yuye-aws Please add your use case and also suggestion if you have regarding avg score mode support in knn. https://github.com/opensearch-project/k-NN/issues/1743
Replied in https://github.com/opensearch-project/k-NN/issues/1743#issuecomment-2347925588. Also, resolving this issue can help resolve a user issue: https://github.com/opensearch-project/ml-commons/issues/2612. I was considering to implement a new search response processor to retrieved most relevant chunks, but is fortunately blocked by the current issue: https://github.com/opensearch-project/ml-commons/issues/2612#issuecomment-2343152694
Would love this!
What is the bug?
I am using text_chunking and text_embedding processor to ingest documents into an index. The text_chunking search example works well, but the inner_hits only returns a single element from the chunked string list. It does not matter when I set the score_mode to max or avg.
How can one reproduce the bug?
What is the expected behavior?
The inner_hits should return matching score and offset of all the retrieved documents.
What is your host/environment?
Mac OS
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.