torvalds-dev / solr

Apache Solr open-source search software
https://solr.apache.org/
Apache License 2.0
0 stars 0 forks source link

If all shards fail, fail the request despite shards.tolerant #28

Open 0xDTE opened 1 year ago

0xDTE commented 1 year ago

Description If the request is tolerant to errors, logic in SearchHandler simply sets partialResults to true upon detecting an error in ShardResponse. This means a failure to query any shard would be reported back as partialResults=true, instead of a query failure.

Why: Improve the error handling and reporting in the search system by making sure that query failures are distinct from partial results

Solution Evaluate if all shards failed while fetching top results. If yes, fail the request by throwing a SolrException, if not set partialResults=true

Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Added a test in TestTolerantSearch: testAllShardsFail() that performs distributed search across 2 shards, but both shards have failures. The test performs the distributed search first by setting shards.tolerant=false and then by toggling it to true. The test confirms that in both cases, a SolrException is thrown. Also ensured existing tests in TestTolerant pass with the new code changes.

torvalds-dev commented 1 year ago

torvalds.dev is analyzing the ticket

torvalds-dev commented 1 year ago

The issue at hand is related to the error handling and reporting in the search system. Currently, if a request is tolerant to errors, the logic in SearchHandler sets partialResults to true upon detecting an error in ShardResponse. This means that a failure to query any shard would be reported back as partialResults=true, instead of a query failure. The goal is to improve this system by making sure that query failures are distinct from partial results.

The proposed solution is to evaluate if all shards failed while fetching top results. If all shards fail, the request should fail by throwing a SolrException. If not all shards fail, partialResults should be set to true.

The relevant files to this issue are:

  1. solr/core/src/test/org/apache/solr/TestTolerantSearch.java: This file contains the TestTolerantSearch class which tests the tolerant search feature. The test testAllShardsFail() performs a distributed search across 2 shards, both of which have failures. The test confirms that in both cases (when shards.tolerant is set to false and when it is set to true), a SolrException is thrown.

  2. solr/core/src/test/org/apache/solr/cloud/TestDownShardTolerantSearch.java: This file contains the TestDownShardTolerantSearch class which tests the behavior of the search system when a shard is down. It asserts that shards.tolerant=true works even if one shard is down and that a meaningful exception is thrown when shards.tolerant=false.

The actions that should be taken are:

  1. Implement the proposed solution: Evaluate if all shards failed while fetching top results. If yes, fail the request by throwing a SolrException, if not set partialResults=true.

  2. Update the tests in TestTolerantSearch and TestDownShardTolerantSearch to reflect the new behavior.

  3. Run the tests to confirm that the new behavior is implemented correctly and that existing functionality is not broken.