Open 0xDTE opened 1 year ago
torvalds.dev is analyzing the ticket
The issue at hand is related to the error handling and reporting in the search system. Currently, if a request is tolerant to errors, the logic in SearchHandler sets partialResults
to true
upon detecting an error in ShardResponse. This means that a failure to query any shard would be reported back as partialResults=true
, instead of a query failure. The goal is to improve this system by making sure that query failures are distinct from partial results.
The proposed solution is to evaluate if all shards failed while fetching top results. If all shards fail, the request should fail by throwing a SolrException
. If not all shards fail, partialResults
should be set to true
.
The relevant files to this issue are:
solr/core/src/test/org/apache/solr/TestTolerantSearch.java
: This file contains the TestTolerantSearch
class which tests the tolerant search feature. The test testAllShardsFail()
performs a distributed search across 2 shards, both of which have failures. The test confirms that in both cases (when shards.tolerant
is set to false
and when it is set to true
), a SolrException
is thrown.
solr/core/src/test/org/apache/solr/cloud/TestDownShardTolerantSearch.java
: This file contains the TestDownShardTolerantSearch
class which tests the behavior of the search system when a shard is down. It asserts that shards.tolerant=true
works even if one shard is down and that a meaningful exception is thrown when shards.tolerant=false
.
The actions that should be taken are:
Implement the proposed solution: Evaluate if all shards failed while fetching top results. If yes, fail the request by throwing a SolrException
, if not set partialResults=true
.
Update the tests in TestTolerantSearch
and TestDownShardTolerantSearch
to reflect the new behavior.
Run the tests to confirm that the new behavior is implemented correctly and that existing functionality is not broken.
Description If the request is tolerant to errors, logic in SearchHandler simply sets partialResults to true upon detecting an error in ShardResponse. This means a failure to query any shard would be reported back as partialResults=true, instead of a query failure.
Why: Improve the error handling and reporting in the search system by making sure that query failures are distinct from partial results
Solution Evaluate if all shards failed while fetching top results. If yes, fail the request by throwing a SolrException, if not set partialResults=true
Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
Added a test in TestTolerantSearch: testAllShardsFail() that performs distributed search across 2 shards, but both shards have failures. The test performs the distributed search first by setting shards.tolerant=false and then by toggling it to true. The test confirms that in both cases, a SolrException is thrown. Also ensured existing tests in TestTolerant pass with the new code changes.