linkedin / rest.li

Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs.
rest.li
Other
2.51k stars 546 forks source link

add service/cluster-not-found count to simple load balancer jmx. And add entry-out-of-sync count to dual read monitoring. #936

Closed bohhyang closed 1 year ago

bohhyang commented 1 year ago

Background

We had issues where Observer accidentally deleted all cluster properties, but we weren't able to see that the client is failing to get clusters on any ingraph metric but only via digging into logs. For dual read, we've been only monitoring when the data is mismatched, but not when data is received on side (like ZK) but not on the other side (like xDS).

Changes

  1. Added service and cluster not found counter to simple load balancer jmx, which gets incremented when timeout happens at fetching the resource.
  2. Added entry out-of-sync count to dual read monitoring, which gets incremented when data is either mis-match or received on one side but not on the other (until a new version is received on the first side), and gets decremented when the data matches.