Open wli-chwy opened 1 year ago
@wli-chwy, are you working with ElasticSearch or OpenSearch 1.x or 2.x?
@wli-chwy as you see from the elastic issue, there's interest in doing something like this, but it's never been executed. We need to investigate the feasibility and hopefully involve folks from the community on this. I searched through the code to see just how deep this goes and found the exception here: https://github.com/opensearch-project/OpenSearch/blob/0a2009250cd857d68cdfa87340f8cd1b906d8c6e/server/src/main/java/org/opensearch/search/SearchService.java#L1396
Are you able to share more about your use case including mapping and queries here?
I also see https://github.com/opensearch-project/OpenSearch/issues/6846 that may be related to your issue.
tagging some folks who may be able to offer some deeper insights/correct my thinking here: @nknize, @msfroh. Is there anyone else who could provide some guidance on this?
@wli-chwy I believe you did try collapsing after reranking, which is typically what users do. How did that work for you? Is there more you can say about that case here to help us figure out what our options could be?
There might be a couple of options we can think about with regards to collapse + rescore:
_routing
), then it's possible to rescore + collapse on each shard while ensuring diversity of collapse keys.SearchResponseProcessor
as part of a search pipeline running on the coordinator node.@macohen it was slow. We need to do the guessing game. We need to guess how many precollapse items could fill up one page. If we guess less, we need to make another call. If we guess more, we waste data transfer.
All in all, it added 40% more latency in application layer. About 100ms.
With https://github.com/opensearch-project/OpenSearch/pull/9405, I added a CollapseResponseProcessor
that could help with this.
Functionally, it's not that different from the application layer collapsing that @macohen suggested above (and @wli-chwy replied adds latency), but it avoids the round-trip between the application and the cluster, keeping the work on the coordinator node.
I don't know if keeping it in the cluster cuts the latency enough, but it may be worth trying out.
Is there any specific integration needed to use the LTR results in the CollapseResponseProcessor?
cc:@msfroh
Is there any specific integration needed to use the LTR results in the CollapseResponseProcessor?
Nope
@wli-chwy, do you think this CollapseResponseProcessor can help?
@wli-chwy I believe you did try collapsing after reranking, which is typically what users do. How did that work for you? Is there more you can say about that case here to help us figure out what our options could be?
I think this strongly depends on the reason why documents are collapsed and how judgments are created. Business examples for retail:
For the case that you collapse t-shirts over different sizes you might want re-rank before collapsing as the different size-variants contain different content that is critical for the search.
For the case that you collapse smartphones over different sellers, you might want to re-rank after collapsing as sellers might not be relevant for your search.
Interestingly, for ES the integration of collapsing and reranking was done the other way round as suggested here (first collapsing, then re-ranking): https://github.com/elastic/elasticsearch/issues/27243
@msfroh @arukris What I understand from your comments is that it is already possible to collapse documents after re-ranking. As this issue seems to be primarily driven by the LTR-topic it might be reasonable to enhance the LTR documentation with an example of this solution as a first step?
Is there a planned release date for this feature? We're planning to use it and very interested in the 'collapse then rescore' functionality and would appreciate any updates.
Is your feature request related to a problem? Please describe. OpenSearch will error "cannot use
collapse
in conjunction withrescore
", if I have both collapse and rescore clause in the query. In ecommerce space, my existing query rely on collapse (collapse on the same parent ID) to deduplicate the same variations of a product. Because of the limitation, I cannot use learn to rank plugin which need rescore to re-rank to improve my search relevancy.Describe the solution you'd like No error issues when using
collapse
in conjunction withrescore
. Andrescore
should happen first to ensure the correct ranking of the face out item.Describe alternatives you've considered N/A
Additional context The same request in Elastic Search https://github.com/elastic/elasticsearch/issues/27243