Restore disk's availability when starting up storage manager

linkedin / ambry

Distributed object store

Apache License 2.0

1.75k stars 275 forks source link

Summary

When disk fails, we have a disk failure handler to set the availability of this disk to false and decrease instance capacity. After disk is fixed, we need to manually restore the disk's availability and update instance capacity in helix. This can be done in code automatically. This PR is trying to do that.

There is a simple method to check if the disk is recovered, we just have to check the mount path of the disk and make sure it exists and is a directory. This is by no mean a prefect condition, but most of the time, it works.

After that, we would update the property store to mark those unavailable disks back to available and increase the instance capacity. This is a two-step operation and they are not atomic, so it's possible we only successfully finish first step and fail on second step. In order to recover it, we will alway try to update the instance capacity when starting up.

Codecov Report

Attention: Patch coverage is 84.78261% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 70.43%. Comparing base (2ec5676) to head (50a540d). Report is 4 commits behind head on master.

Files	Patch %	Lines
...in/java/com/github/ambry/store/StorageManager.java	81.08%	3 Missing and 4 partials :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #2750 +/- ## ============================================ - Coverage 70.66% 70.43% -0.24% + Complexity 11593 11585 -8 ============================================ Files 834 837 +3 Lines 71049 71234 +185 Branches 8536 8550 +14 ============================================ - Hits 50209 50171 -38 - Misses 18208 18439 +231 + Partials 2632 2624 -8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

linkedin / ambry

Restore disk's availability when starting up storage manager #2750

Summary

Test

Codecov Report