Closed justinlin-linkedin closed 7 months ago
Attention: Patch coverage is 84.78261%
with 7 lines
in your changes are missing coverage. Please review.
Project coverage is 70.43%. Comparing base (
2ec5676
) to head (50a540d
). Report is 4 commits behind head on master.
Files | Patch % | Lines |
---|---|---|
...in/java/com/github/ambry/store/StorageManager.java | 81.08% | 3 Missing and 4 partials :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Summary
When disk fails, we have a disk failure handler to set the availability of this disk to false and decrease instance capacity. After disk is fixed, we need to manually restore the disk's availability and update instance capacity in helix. This can be done in code automatically. This PR is trying to do that.
There is a simple method to check if the disk is recovered, we just have to check the mount path of the disk and make sure it exists and is a directory. This is by no mean a prefect condition, but most of the time, it works.
After that, we would update the property store to mark those unavailable disks back to available and increase the instance capacity. This is a two-step operation and they are not atomic, so it's possible we only successfully finish first step and fail on second step. In order to recover it, we will alway try to update the instance capacity when starting up.
Test
Unit test