Presently, if we have some sort of failure in one of our monitoring services that mlab-ns relies on that causes it to return a down status (0), then mlab-ns could potentially mark the entire fleet as down. At a certain threshold mlab-ns should stop marking SliverTools as down. This threshold should be something below the the threshold for the alert for too many NDT servers being down, so that we still receive alerts that a problem exists.
Presently, if we have some sort of failure in one of our monitoring services that mlab-ns relies on that causes it to return a down status (0), then mlab-ns could potentially mark the entire fleet as down. At a certain threshold mlab-ns should stop marking SliverTools as down. This threshold should be something below the the threshold for the alert for too many NDT servers being down, so that we still receive alerts that a problem exists.