We had issues where Observer accidentally deleted all cluster properties, but we weren't able to see that the client is failing to get clusters on any ingraph metric but only via digging into logs.
For dual read, we've been only monitoring when the data is mismatched, but not when data is received on side (like ZK) but not on the other side (like xDS).
Changes
Added service and cluster not found counter to simple load balancer jmx, which gets incremented when timeout happens at fetching the resource.
Added entry out-of-sync count to dual read monitoring, which gets incremented when data is either mis-match or received on one side but not on the other (until a new version is received on the first side), and gets decremented when the data matches.
Background
We had issues where Observer accidentally deleted all cluster properties, but we weren't able to see that the client is failing to get clusters on any ingraph metric but only via digging into logs. For dual read, we've been only monitoring when the data is mismatched, but not when data is received on side (like ZK) but not on the other side (like xDS).
Changes