tangle-network / dkg-substrate

Multy-party threshold ECDSA (GG20) Substrate node
https://tangle.webb.tools/
GNU General Public License v3.0
60 stars 15 forks source link

[BUG] High memory utilisation of testnet nodes #668

Closed 1xstj closed 1 year ago

1xstj commented 1 year ago

Describe the bug

Testnet nodes crash after while with this error:

2023-07-07 10:49:22.614 
2023-07-07 05:13:42.787 ERROR tokio-runtime-worker dkg_gadget: failed to send log message to file: SendError { .. }
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.770  WARN tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: Unable to shutdown meta handler since it is already Terminated, ignoring...
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.770 ERROR tokio-runtime-worker dkg_gadget: failed to send log message to file: SendError { .. }
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.770  INFO tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: MetaAsyncProtocol is ending: Terminated, History: [Beginning, OfflineAndVoting, Terminated]
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.770 ERROR tokio-runtime-worker dkg_gadget: failed to send log message to file: SendError { .. }
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.752 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: AsyncProtocolParameters(11)'s handler is going to be dropped
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.752 ERROR tokio-runtime-worker dkg_gadget: failed to send log message to file: SendError { .. }
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.708 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: AsyncProtocolParameters(11)'s handler is going to be dropped
2023-07-07 10:49:22.614 
2023-07-07 05:13:42.641 ERROR tokio-runtime-worker dkg_gadget: failed to send log message to file: SendError { .. }
2023-07-07 10:49:22.613 
2023-07-07 05:13:36.434  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: 🎁 Prepared block for proposing at 39067 (9211 ms) [hash: 0x9ff5250390a66f7c42427858c27bf11320137d2bbb59a10b890055e2c593a0a0; parent_hash: 0x9cd5…6dde; extrinsics (1): [0x7ac2…96f2]]    
2023-07-07 10:49:22.613 
2023-07-07 05:13:34.845  INFO tokio-runtime-worker substrate: πŸ’€ Idle (4 peers), best: #39066 (0x9cd5…6dde), finalized #39064 (0xec04…5b61), ⬇ 0 ⬆ 0    
2023-07-07 10:49:22.613 
2023-07-07 05:13:32.370  INFO tokio-runtime-worker aura: βŒ›οΈ Discarding proposal for slot 281451132; block production took too long    
2023-07-07 10:49:22.613 
2023-07-07 05:13:28.597  INFO tokio-runtime-worker substrate: πŸ’€ Idle (4 peers), best: #39066 (0x9cd5…6dde), finalized #39064 (0xec04…5b61), ⬇ 0 ⬆ 0    
2023-07-07 10:49:22.613 
2023-07-07 05:13:24.572  INFO tokio-runtime-worker dkg_gadget::signing: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: About to get unsigned proposals ...
2023-07-07 10:49:22.613 
2023-07-07 05:13:24.300  INFO tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: πŸ•ΈοΈ  SIGNING PARTY 3 | SESSION 472 | IN THE SET OF BEST AUTHORITIES
2023-07-07 10:49:22.613 
2023-07-07 05:13:22.405  INFO tokio-runtime-worker substrate: πŸ’€ Idle (4 peers), best: #39066 (0x9cd5…6dde), finalized #39064 (0xec04…5b61), ⬇ 0 ⬆ 0    
2023-07-07 10:49:22.613 
2023-07-07 05:13:18.856 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: *** Should execute new keygen? AnticipatedKeygenExecutionStatus { execute: true, force_execute: false }
2023-07-07 10:49:22.613 
2023-07-07 05:13:18.603 DEBUG tokio-runtime-worker dkg_gadget: [12D3KooWRdvZ3PRteq8DC78Z3z5ZiehipKrKhHDRpgvCjc8XSeQx]: *** KeygenManager on_block_finalized: session=472,block=39064, state=Failed { session_id: 473 }, current_protocol=None | total executed: 473
2023-07-07 10:49:22.613 
2023-07-07 05:13:13.968  INFO tokio-runtime-worker substrate: πŸ’€ Idle (4 peers), best: #39066 (0x9cd5…6dde), finalized #39064 (0xec04…5b61), ⬇ 0 ⬆ 0    
[10:56](https://hicommonwealth.slack.com/archives/C04L1D94SN5/p1688707560223349)
Maybe we have too many active threads, the other nodes also show similar logs, there is no panic or any error, the node shutsdown and kills the binary.