Open torvalds-dev opened 1 year ago
torvalds.dev is analyzing the ticket
The issue at hand is related to the error handling and reporting in the search system. Currently, if a request is tolerant to errors, the logic in SearchHandler sets partialResults
to true
upon detecting an error in ShardResponse. This means that a failure to query any shard would be reported back as partialResults=true
, instead of a query failure. The goal is to improve this system by making sure that query failures are distinct from partial results.
The proposed solution is to evaluate if all shards failed while fetching top results. If all shards fail, the request should fail by throwing a SolrException
. If not all shards fail, partialResults
should be set to true
.
The relevant files for this issue are:
solr/core/src/test/org/apache/solr/TestTolerantSearch.java
: This file contains the TestTolerantSearch
class which includes tests for the tolerant search feature. The test testAllShardsFail()
is particularly relevant as it performs a distributed search across 2 shards, both of which have failures. The test confirms that a SolrException
is thrown in both cases when shards.tolerant
is set to false
and when it is set to true
.
solr/core/src/test/org/apache/solr/cloud/TestDownShardTolerantSearch.java
: This file contains the TestDownShardTolerantSearch
class which tests the behavior of the search system when one shard is down. It asserts that shards.tolerant=true
works even if one shard is down and that a meaningful exception is thrown when shards.tolerant=false
.
The actions that should be taken are:
Implement the proposed solution in the search system. This would involve modifying the logic in SearchHandler
to throw a SolrException
when all shards fail while fetching top results.
Update the tests in TestTolerantSearch
and TestDownShardTolerantSearch
to reflect the new behavior of the search system. Ensure that the tests pass with the new code changes.
Review the changes and test the search system thoroughly to ensure that the new error handling and reporting mechanism works as expected.
Description If the request is tolerant to errors, logic in SearchHandler simply sets partialResults to true upon detecting an error in ShardResponse. This means a failure to query any shard would be reported back as partialResults=true, instead of a query failure.
Why: Improve the error handling and reporting in the search system by making sure that query failures are distinct from partial results
Solution Evaluate if all shards failed while fetching top results. If yes, fail the request by throwing a SolrException, if not set partialResults=true
Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.
Added a test in TestTolerantSearch: testAllShardsFail() that performs distributed search across 2 shards, but both shards have failures. The test performs the distributed search first by setting shards.tolerant=false and then by toggling it to true. The test confirms that in both cases, a SolrException is thrown. Also ensured existing tests in TestTolerant pass with the new code changes.