threefoldtech / rmb-rs

RMB implementation in rust
Apache License 2.0
3 stars 1 forks source link

Some Mainnet nodes are not responding over RMB #184

Closed sameh-farouk closed 4 months ago

sameh-farouk commented 6 months ago

A routine RMB reachability check for Main net nodes:

Gridproxy reported 1694 nodes as up.

Only 1588 responding over RMB.


expected_responses: 1694
received_success: 1588
received_errors: 2
no response errors (client give up): 104
twins not responding (twin IDs): {2562, 5666, 2086, 2092, 6701, 2101, 7224, 6212, 8261, 1093, 2128, 5212, 3680, 615, 4204, 4094, 2171, 5243, 2173, 2176, 4736, 2194, 3733, 3231, 2723, 2212, 2213, 2216, 2219, 6828, 5293, 2224, 5812, 2231, 2232, 2242, 4294, 4296, 2249, 2260, 725, 726, 2261, 733, 2272, 2281, 10475, 8942, 2309, 4359, 2312, 2329, 3867, 2332, 798, 6433, 2339, 2340, 7975, 2347, 2348, 2356, 7477, 3384, 2368, 2369, 3912, 8544, 5987, 8043, 6006, 7031, 4474, 5499, 5500, 2437, 6023, 7560, 6025, 406, 7065, 2459, 2462, 2464, 7073, 2477, 2478, 2479, 2480, 4536, 5569, 2505, 4041, 971, 2508, 8137, 3025, 3028, 5079, 3032, 7639, 3041, 2540, 2541, 5625, 4606}```

Note: Due to grid-proxy delay for reporting down nodes, some of the non-responding nodes could be down.
muhamadazmy commented 6 months ago

Some nodes on this list actually responding to rmb.

Some other nodes are really in a bad state (logs shows lots of filesystem errors) so those nodes shouldn't be used for deployment.

There is a fix (not yet on mainnet) that will make those nodes not even push up-time

muhamadazmy commented 6 months ago

Some nodes in that list are also in standby mode which means they are down but can be brought up when needed

xmonader commented 4 months ago

can be safely closed now