waku-org / nwaku

Waku node and protocol.
Other
203 stars 54 forks source link

nim_waku_p2p_max_connections limit of 150 makes the canary think that the node is offline #3021

Closed siddarthkay closed 2 months ago

siddarthkay commented 2 months ago

Background

ref -> https://canary.infra.status.im/service/174/

Screenshot 2024-09-05 at 5 19 51 PM

Details

When I checked node-01.do-ams3.waku.sandbox for these alerts.

The reason why canary thinks the node is offline was due to

# P2P Connections
nim_waku_p2p_max_connections: 150

cc @jakubgs

Acceptance criteria

We need a way for canary to be able to get node status when a node is busy else we may run into false positives during investigations of alerts.

jakubgs commented 2 months ago

The max on the fleet is 300:

# Limits
nim_waku_p2p_max_connections: 300

https://github.com/status-im/infra-waku/blob/6e6849b1bd6b05897a73d1f3706c503ebb80951f/ansible/group_vars/node.yml#L34-L35

And indeed at times the nodes do hit 300:

image