ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
3.88k stars 537 forks source link

Check why the HC API did not report that Kesus restarts frequently #1225

Open StekPerepolnen opened 8 months ago

StekPerepolnen commented 8 months ago

Check that tablets restarting in the lbs-sas cluster domain are reported to the HC API HC over lbk* clusters will not detect faulty tablets in /Root. I reproduced it like this: I banned the launch of tablets on storage nodes and dropped one DS. HC is silent on the database. Even in database monitoring.

StekPerepolnen commented 7 months ago

We do not monitor restarts of root system tablets. For Hive, SS, BSC, we could see tablets in the check if they are unresponsive. But root kesus, coordinators, mediators, tx-allocators can suffer in any way, and we do not monitor this directly