linkedin / rest.li

Rest.li is a REST+JSON framework for building robust, scalable service architectures using dynamic discovery and simple asynchronous APIs.
rest.li
Other
2.51k stars 546 forks source link

fix publishing uri and cluster properties for symlink clusters #956

Closed bohhyang closed 11 months ago

bohhyang commented 11 months ago

Summary

gcn-39955 After switched to observer-only mode, job-postings-mt was not able to make downstream calls to d2 services under a symlink cluster. We found that although the uri nodes were fetched correctly for symlink cluster, it was merged with the real cluster name, instead of the symlink cluster name (e.g: FooCluster-prod-ltx1 instead of $FooClusterMaster). The cluster name in the merged uri properties is used (by UriLoadBalancerSubscriber) to find d2 services associated with it, which fails since the services are managed under the symlink cluster name.

This change:

  1. merges the uri properties under the symlink cluster name.
  2. report the uri properties to dual read monitoring under the symlink cluster name.
  3. removed the additional publishing under the real cluster name when it's publish for symlink cluster.
  4. added jmx metrics to count the number of symlink clusters, the number of total hosts (tracker clients) for each d2 service in default partition and all partitions.

Test Done

Unit tests.

Manual testing with QEI d2-proxy with observer-only config (): curli -v "d2://cart" --d2-proxy-url "http://localhost:21360/d2/" -X OPTIONS --force-insecure-d2 < Content-Type: application/json < x-linkedin-processing-colo: ei-ltx1 < x-linkedin-processing-machine: ltx1-app4245.stg.linkedin.com { "models": { "com.linkedin.payments.ContactInfo": { "namespace": "com.linkedin.payments", "name": "ContactInfo", "doc": "Contact Information.", "fields": [ { "type": "string", "name": "firstName", "doc": "First name." }, ……… ], "type": "record" } }, "resources": { "com.linkedin.payments.client.cart": { "schema": "com.linkedin.payments.Cart", "path": "/cart", "namespace": "com.linkedin.payments.client", "name": "cart", "doc": "Cart restli resource. Handles all the operations to the cart.\n\ngenerated from: com.linkedin.oms.bps.rest.impl.CartResource", "collection": { .............. "supports": [ "create", "update" ], "methods": [ { "method": "create", "javaMethodName": "create" }, { "method": "update", "javaMethodName": "update" } ] }, "resourceClass": "com.linkedin.oms.bps.rest.impl.CartResource" } } }

bohhyang commented 11 months ago

Just double confirm. From former comment, there is a corner case:

Also, rarely and possibly, the original cluster could have subscribers too when calls are made directly to the original cluster, so we publish for it too.

So when the service use symlink cluster : the original logic is :

  • publish symlinkName
  • publish raw clusterName

But now we only publish symlinkName, right ?

yes, that's right. It's actually impossible to make calls directly to the original/real cluster. And the two publishes (even with different cluster names) were publishing the same data, which could have unknown risky behaviors in updating the load balancer state (and its listeners).