[BUG]: Remote RPC timeouts

sgerbino commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current behavior

While syncing a large portion of peers is showing remote RPC timeouts causing error scores to rise quickly.

Expected behavior

Remote RPC timeouts should be seldom when a peer is actually unresponsive.

Steps to reproduce

Sync

Environment

- OS: macOS

Anything else?

koinos-p2p-1                  | 2022-10-05 00:59:26.537661 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/194.13.80.81/tcp/8888/p2p/QmYat2DMoTRndEejADZCVostCiiDQME4kUxnrJLCV12kEN
koinos-p2p-1                  | 2022-10-05 00:59:26.538984 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/91.110.20.156/tcp/8888/p2p/QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ
koinos-p2p-1                  | 2022-10-05 00:59:26.540000 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/64.62.202.13/tcp/8888/p2p/QmXMqyby668eujPWSmy6w4WEjVhJgh89PqrbVDCiQTpqy6
koinos-p2p-1                  | 2022-10-05 00:59:27.355068 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 91708
koinos-chain-1                | 2022-10-05 00:59:28.341835 (chain.Koinos) [controller.cpp:370] <info>: Sync progress - Height: 35000, ID: 0x1220eb775039fbe4896da1f6e8ef80e2200bdf1aa4bdf0bb678cfff09ca3fba0e74d (41d, 05h, 10m, 21s block time remaining)
koinos-transaction_store-1    | 2022-10-05 00:59:28.345744 (transaction_store.Koinos) [koinos-transaction-store/main.go:160] <info>: Sync block progress - Height: 35000, ID: 0x1220eb775039fbe4896da1f6e8ef80e2200bdf1aa4bdf0bb678cfff09ca3fba0e74d
koinos-block_store-1          | 2022-10-05 00:59:28.345630 (block_store.Koinos) [koinos-block-store/main.go:167] <info>: Sync block progress - Height: 35000, ID: 0x1220eb775039fbe4896da1f6e8ef80e2200bdf1aa4bdf0bb678cfff09ca3fba0e74d
koinos-p2p-1                  | 2022-10-05 00:59:29.302397 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 34953-35953 from peer QmQxdGDZ5GSjMhNK8R8AH2mqhxzZDK281ER6ATKmD6U9oV
koinos-p2p-1                  | 2022-10-05 00:59:29.923875 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 34982-35982 from peer QmS7eLZei46Y7CZZZWib49G3ojuRDr16GSQW6FS5xFYRZk
koinos-p2p-1                  | 2022-10-05 00:59:33.199356 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmQTeLp4gE9xGoDheun2oshm1aHvqu4SE8hZBwejCtdfeg, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 54691
koinos-p2p-1                  | 2022-10-05 00:59:33.346753 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93076
koinos-p2p-1                  | 2022-10-05 00:59:34.458169 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 91958
koinos-p2p-1                  | 2022-10-05 00:59:34.837081 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 35119-36119 from peer QmYat2DMoTRndEejADZCVostCiiDQME4kUxnrJLCV12kEN
koinos-p2p-1                  | 2022-10-05 00:59:40.361956 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93324
koinos-p2p-1                  | 2022-10-05 00:59:41.569644 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 92205
koinos-p2p-1                  | 2022-10-05 00:59:45.201567 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmQTeLp4gE9xGoDheun2oshm1aHvqu4SE8hZBwejCtdfeg, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 54937
koinos-p2p-1                  | 2022-10-05 00:59:47.378347 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93570
koinos-p2p-1                  | 2022-10-05 00:59:48.687917 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 92449
koinos-p2p-1                  | 2022-10-05 00:59:53.660556 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 35683-36683 from peer QmcGiTpSm6YrmYo3rWoqrCPez2aJY4VdraBQsGsZKwFRuG
koinos-p2p-1                  | 2022-10-05 00:59:53.756801 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 35683-36683 from peer QmXMqyby668eujPWSmy6w4WEjVhJgh89PqrbVDCiQTpqy6
koinos-p2p-1                  | 2022-10-05 00:59:54.054212 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 35689-36689 from peer QmYGwR34dRHqaU3ctpB2zmXZ3PJcUSGbE7mB3WyxkcsigH
koinos-p2p-1                  | 2022-10-05 00:59:54.328127 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 35697-36697 from peer QmaePsPBY71RhTueGGLLbvRNnRBcC3B6ZXN66rpFgTpcMi
koinos-p2p-1                  | 2022-10-05 00:59:54.396713 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93814
koinos-p2p-1                  | 2022-10-05 00:59:55.767719 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 92693
koinos-p2p-1                  | 2022-10-05 00:59:57.181010 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmQTeLp4gE9xGoDheun2oshm1aHvqu4SE8hZBwejCtdfeg, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 55180
koinos-p2p-1                  | 2022-10-05 01:00:01.393471 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 35892-36892 from peer QmQxdGDZ5GSjMhNK8R8AH2mqhxzZDK281ER6ATKmD6U9oV
koinos-p2p-1                  | 2022-10-05 01:00:01.400710 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 94055
koinos-p2p-1                  | 2022-10-05 01:00:02.107997 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 35921-36921 from peer QmS7eLZei46Y7CZZZWib49G3ojuRDr16GSQW6FS5xFYRZk
koinos-chain-1                | 2022-10-05 01:00:02.535556 (chain.Koinos) [controller.cpp:370] <info>: Sync progress - Height: 36000, ID: 0x1220ac3d28f3430640aa91149fc05671411f587c07263141280af447c8674cf4ce30 (41d, 04h, 27m, 30s block time remaining)
koinos-block_store-1          | 2022-10-05 01:00:02.541522 (block_store.Koinos) [koinos-block-store/main.go:167] <info>: Sync block progress - Height: 36000, ID: 0x1220ac3d28f3430640aa91149fc05671411f587c07263141280af447c8674cf4ce30
koinos-transaction_store-1    | 2022-10-05 01:00:02.541188 (transaction_store.Koinos) [koinos-transaction-store/main.go:160] <info>: Sync block progress - Height: 36000, ID: 0x1220ac3d28f3430640aa91149fc05671411f587c07263141280af447c8674cf4ce30
koinos-p2p-1                  | 2022-10-05 01:00:02.871330 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 92935
koinos-p2p-1                  | 2022-10-05 01:00:06.730424 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36058-37058 from peer QmYat2DMoTRndEejADZCVostCiiDQME4kUxnrJLCV12kEN
koinos-p2p-1                  | 2022-10-05 01:00:08.423371 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 94295
koinos-p2p-1                  | 2022-10-05 01:00:09.185867 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmQTeLp4gE9xGoDheun2oshm1aHvqu4SE8hZBwejCtdfeg, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 55420
koinos-p2p-1                  | 2022-10-05 01:00:09.976569 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93175
koinos-p2p-1                  | 2022-10-05 01:00:15.448591 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 94532
koinos-p2p-1                  | 2022-10-05 01:00:17.082036 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93413
koinos-p2p-1                  | 2022-10-05 01:00:21.188259 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmQTeLp4gE9xGoDheun2oshm1aHvqu4SE8hZBwejCtdfeg, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 55656
koinos-p2p-1                  | 2022-10-05 01:00:22.462165 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 94769
koinos-chain-1                | 2022-10-05 01:00:24.004761 (chain.Koinos) [system_calls.cpp:741] <info>: Overriding system call 8 with thunk 8
koinos-chain-1                | 2022-10-05 01:00:24.007456 (chain.Koinos) [system_calls.cpp:741] <info>: Overriding system call 110 with thunk 110
koinos-p2p-1                  | 2022-10-05 01:00:24.190906 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93648
koinos-p2p-1                  | 2022-10-05 01:00:26.134256 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36622-37622 from peer QmcGiTpSm6YrmYo3rWoqrCPez2aJY4VdraBQsGsZKwFRuG
koinos-p2p-1                  | 2022-10-05 01:00:26.232582 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36622-37622 from peer QmXMqyby668eujPWSmy6w4WEjVhJgh89PqrbVDCiQTpqy6
koinos-p2p-1                  | 2022-10-05 01:00:26.464987 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36628-37628 from peer QmYGwR34dRHqaU3ctpB2zmXZ3PJcUSGbE7mB3WyxkcsigH
koinos-p2p-1                  | 2022-10-05 01:00:26.498432 (p2p.Koinos) [node/node.go:352] <info>: My address:
koinos-p2p-1                  | 2022-10-05 01:00:26.502282 (p2p.Koinos) [node/node.go:353] <info>:  - /ip4/172.26.0.10/tcp/8888/p2p/QmcohKnkseH2KE7VHwsefW5n84c7rPkN99t1L6homjUYV6
koinos-p2p-1                  | 2022-10-05 01:00:26.504056 (p2p.Koinos) [node/node.go:354] <info>: Connected peers:
koinos-p2p-1                  | 2022-10-05 01:00:26.504956 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/91.110.20.156/tcp/8888/p2p/QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ
koinos-p2p-1                  | 2022-10-05 01:00:26.505754 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/64.62.202.13/tcp/8888/p2p/QmXMqyby668eujPWSmy6w4WEjVhJgh89PqrbVDCiQTpqy6
koinos-p2p-1                  | 2022-10-05 01:00:26.506786 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/185.130.47.36/tcp/8888/p2p/QmQTeLp4gE9xGoDheun2oshm1aHvqu4SE8hZBwejCtdfeg
koinos-p2p-1                  | 2022-10-05 01:00:26.507471 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/118.69.194.69/tcp/8888/p2p/QmQxdGDZ5GSjMhNK8R8AH2mqhxzZDK281ER6ATKmD6U9oV
koinos-p2p-1                  | 2022-10-05 01:00:26.508281 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/194.13.80.81/tcp/8888/p2p/QmYat2DMoTRndEejADZCVostCiiDQME4kUxnrJLCV12kEN
koinos-p2p-1                  | 2022-10-05 01:00:26.509001 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/139.144.17.121/tcp/8888/p2p/QmcGiTpSm6YrmYo3rWoqrCPez2aJY4VdraBQsGsZKwFRuG
koinos-p2p-1                  | 2022-10-05 01:00:26.509554 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/45.79.175.16/tcp/8888/p2p/QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi
koinos-p2p-1                  | 2022-10-05 01:00:26.510057 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/144.76.219.133/tcp/8888/p2p/QmYGwR34dRHqaU3ctpB2zmXZ3PJcUSGbE7mB3WyxkcsigH
koinos-p2p-1                  | 2022-10-05 01:00:26.510735 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/144.91.72.52/tcp/8888/p2p/QmaePsPBY71RhTueGGLLbvRNnRBcC3B6ZXN66rpFgTpcMi
koinos-p2p-1                  | 2022-10-05 01:00:26.511107 (p2p.Koinos) [node/node.go:356] <info>:  - /ip4/95.216.15.80/tcp/8888/p2p/QmS7eLZei46Y7CZZZWib49G3ojuRDr16GSQW6FS5xFYRZk
koinos-p2p-1                  | 2022-10-05 01:00:26.756370 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36636-37636 from peer QmaePsPBY71RhTueGGLLbvRNnRBcC3B6ZXN66rpFgTpcMi
koinos-p2p-1                  | 2022-10-05 01:00:29.456969 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 95003
koinos-p2p-1                  | 2022-10-05 01:00:31.278192 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 93882
koinos-p2p-1                  | 2022-10-05 01:00:32.667964 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36831-37831 from peer QmQxdGDZ5GSjMhNK8R8AH2mqhxzZDK281ER6ATKmD6U9oV
koinos-p2p-1                  | 2022-10-05 01:00:33.169520 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmQTeLp4gE9xGoDheun2oshm1aHvqu4SE8hZBwejCtdfeg, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 55889
koinos-p2p-1                  | 2022-10-05 01:00:33.406859 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36860-37860 from peer QmS7eLZei46Y7CZZZWib49G3ojuRDr16GSQW6FS5xFYRZk
koinos-chain-1                | 2022-10-05 01:00:35.874814 (chain.Koinos) [controller.cpp:370] <info>: Sync progress - Height: 37000, ID: 0x1220932d62c6086a23bbc4dc7715285494d1c787718318f26caffa0ca3d3e4895fac (41d, 03h, 58m, 00s block time remaining)
koinos-transaction_store-1    | 2022-10-05 01:00:35.879213 (transaction_store.Koinos) [koinos-transaction-store/main.go:160] <info>: Sync block progress - Height: 37000, ID: 0x1220932d62c6086a23bbc4dc7715285494d1c787718318f26caffa0ca3d3e4895fac
koinos-block_store-1          | 2022-10-05 01:00:35.879037 (block_store.Koinos) [koinos-block-store/main.go:167] <info>: Sync block progress - Height: 37000, ID: 0x1220932d62c6086a23bbc4dc7715285494d1c787718318f26caffa0ca3d3e4895fac
koinos-p2p-1                  | 2022-10-05 01:00:36.474518 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmeQ39HdgAiEiQGv97XAeQsL1HUWopKhSVCXupG8EmwqEi, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 95235
koinos-p2p-1                  | 2022-10-05 01:00:37.831871 (p2p.Koinos) [p2p/peer_connection.go:144] <info>: Requesting blocks 36997-37997 from peer QmYat2DMoTRndEejADZCVostCiiDQME4kUxnrJLCV12kEN
koinos-p2p-1                  | 2022-10-05 01:00:38.383483 (p2p.Koinos) [p2p/error_handler.go:97] <info>: Encountered peer error: QmUtqPnhDSq6MXG7gWFZiAxttEy29vrjihWyDFJqMusmvZ, peer RPC error, peer RPC request timed out, context deadline exceeded. Current error score: 94114

theoreticalbts commented 1 year ago

One guess as to what may be happening:

Sync requests 1k blocks at a time
It doesn't consider the size of the blocks or the bandwidth of the connection

Of course, sending big blocks over a slow connection takes a long time. Possibly enough to trip the timeout.

A couple things we could do:

(1) On the client side, don't trip the timeout as long as bytes were received recently.
(2) On the server side, monitor the elapsed time, and stop sending blocks when some threshold has elapsed (e.g. 40% of the default client timeout).
(3) On the server side, stop sending blocks when some size threshold has been reached, then tune timeout such that (expected peer bandwidth x timeout = safety factor x size threshold).

Personally I think (1) isn't the right approach (slow loris issues and implementing it may require getting deep into the plumbing of libp2p), (2) is more viable but doesn't fit with current architecture (we need to stream the response instead of creating it in one shot), and by process of elimination that leaves us to pick (3).

I suggest a size threshold of 3x max block size, typical bandwidth of 400k / sec, and a safety factor of 3. So this leaves us with the following implementation sketch:

(a) Allow the possibility GetBlocksResponse returns fewer blocks than requested. (I don't think we actually need to change protobuf struct definitions, we just need to look at the client-side code and make sure it's prepared to accept Blocks[] may be shorter than the requested NumBlocks.)
(b) As GetBlocks() marshals blocks to response.Blocks, it keeps track of the running total size of all blocks in the response.
(c) If the running total exceeds a hardcoded size threshold of 1.5MB, GetBlocks() truncates response.Blocks and returns fewer blocks than requested.
(d) With bandwidth of 400 kb / sec, it should take at most 4 seconds to transmit the GetBlocks() response of 1.5 MB.
(e) To get a safety factor of 3, we need to set the RPC timeout to be at least 12 seconds.

(I assumed max block size of 0.5 MB but I'm not sure if this is actually the case. If max block size is larger than 0.5 MB we should scale all the above numbers proportionally.)

If this is implemented, we should also create a separate issue on the block store to implement a size threshold there. (This is an optimization so that e.g. if we get into some large blocks, we can avoid a situation where the block store reads 1000 large blocks and sends them to the p2p, only for the p2p to consume like 5 of them, decide that its size threshold is reached and it doesn't want any more, then throw away the remaining 995.)

sgerbino commented 1 year ago

Related to #245.

mvandeberg commented 1 year ago

Possibly closed by #250

mvandeberg commented 1 year ago

We need to spot check this once the network has updated to a recent version of p2p

sgerbino commented 1 year ago

Possibly resolved by #245.

koinos / koinos-p2p