tarantool / metrics

Metric collection library for Tarantool
MIT License
39 stars 23 forks source link

False triggering of the tnt_replication_status metric for self upstream peer #427

Closed proccpu closed 1 year ago

proccpu commented 1 year ago

Hi,

Replica status of local instance stuck in the down state.

During instance startup the box.info is returning box.info.replication[N].upsream.status with a connected value for all replicas mentioned in the box.cfg.replication, including the identifier of the instance itself (box.info.id). If we try to export metrics at this point of time, we get 0 value (down) as a replication status for all upstreams. The section upstream will be missing in box.info.replication for self instance after startup. Thus we will continue to observe down replica status for upstream with a N id.

It might be worth adding the following condition k ~= info.id to the uptsream section handler. https://github.com/tarantool/metrics/blob/master/metrics/tarantool/info.lua#L34

Please check following examples to get more information about the issue: box_info_after_startup.txt box_info_during_startup.txt tnt_replication_status_after_startup.txt

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days