linkedin / ambry

Distributed object store
https://github.com/linkedin/ambry/wiki
Apache License 2.0
1.75k stars 275 forks source link

Terminate JVM in disk failure handler when too many disks failed #2778

Closed justinlin-linkedin closed 6 months ago

justinlin-linkedin commented 6 months ago

Summary

When storage manager starts, we would check if there are too many unavailable disks. If so, StorageManager would throw an exception to fail the initialization. In this PR, we are bringing the same logic to DiskFailureHandler.

Reasons for change

Disks could fail at any given time. If all disks fail at runtime, StorageManager would be able to successfully start up without any exception. However, when disk failure handler is running, it would move all the replicas out. Why do we want to terminate JVM when this happens? The reason is that we want to kill the JVM this host would become unavailable from our tools. It would be much easier to use our tools for find unavailable hosts.

Changes

Using System.exit(1) to terminate JVM in disk failure handler when the number of failed disks reaches the threshold for termination.

Test

No tests.

codecov-commenter commented 6 months ago

Codecov Report

Attention: Patch coverage is 81.25000% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 70.27%. Comparing base (52ba813) to head (6fa44b7). Report is 9 commits behind head on master.

Files Patch % Lines
...in/java/com/github/ambry/store/StorageManager.java 81.25% 1 Missing and 2 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2778 +/- ## ============================================ + Coverage 64.24% 70.27% +6.02% - Complexity 10398 11686 +1288 ============================================ Files 840 840 Lines 71755 71886 +131 Branches 8611 8638 +27 ============================================ + Hits 46099 50515 +4416 + Misses 23004 18744 -4260 + Partials 2652 2627 -25 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.