openconfig / gnmic

gNMIc is a gNMI CLI client and collector
https://gnmic.openconfig.net
Apache License 2.0
170 stars 55 forks source link

gNMIc targets redistribution doesn`t work after pod fails #506

Closed vseregin63 closed 4 weeks ago

vseregin63 commented 4 weeks ago

Hi, Team!

I am testing topology with 3 gNMIc nodes and redis for telemetry subscribe. Targets obtain from remote http loader file with custom tags

{"leaf1.dc": { "address": "leaf1.dc", "tls-server-name": "leaf.dc", "tls-ca": "/var/run/secrets/huawei_ca", "subscriptions": [ "huawei" ], "tags": [ "cluster-name=default-cluster", "instance-name=pod1" ] }, "leaf2.dc": { "address": "leaf2.dc", "tls-server-name": "leaf.dc", "tls-ca": "/var/run/secrets/huawei_ca", "subscriptions": [ "huawei" ], "tags": [ "cluster-name=default-cluster", "instance-name=pod2" ] }, ... ... These tags match same tags of collectors

pod1 - "tags": ["cluster-name=default-cluster","instance-name=pod1"] pod2 - "tags": ["cluster-name=default-cluster","instance-name=pod2"] ...

gnmic.yaml is same on all pods

insecure: false encoding: json

clustering: targets-watch-timer: 30s locker: type: redis servers:

loader: type: http interval: 60s timeout: 50s start-delay: 5s auth-scheme: Token debug: true

subscriptions: huawei: paths:

outputs: prom: type: prometheus listen: :9804 path: /metrics expiration: 60s export-timestamps: true timeout: 30s

We found that leafs are distributed between pods by tags values correctly. But after pod is disabled and enabled again it`s targets reconnected to another pods and do not return back after pod become active in cluster. It seems collectors do not try to reassign elements, but in my logic they have to do it after some interval (is it true?). Please help find the reason of such behaviour.

Thx

karimra commented 4 weeks ago

Yes, that's how it works today. There is no automatic redistribution of targets when pods are created (it only happens when pods fail). There is no real advantage in moving targets automatically, it would just disconnect and reconnect to create the subscriptions. You can delete the target manually using the rest API to trigger the leader to assign it to a newly created pod.

vseregin63 commented 4 weeks ago

@karimra, thanks for response!