nimiq / core-rs-albatross

Rust implementation of the Albatross protocol
https://nimiq.com
Other
162 stars 64 forks source link

Full node cannot achieve consensus #2954

Open joeleol opened 1 month ago

joeleol commented 1 month ago

I have set up a new full node on bare metal (12x5.1GHz, 32GB RAM and 1TB NVME), this node does not seem able to achieve consensus.

You will see in the logs that I restarted it several times during the course of 9hrs, at one point even deleting the db.

This is not the first time I experienced this issue.

Log: https://www.transfernow.net/dl/20241002PGuWWsd9 (available for 7 days)

jsdanielh commented 1 month ago

Extracting interesting data from the log:

  1. State sync never completes:
    2024-10-02T13:24:52.277485003Z INFO  state_queue          | Received state sync chunk, ~99.99% complete start_key=fff8d7
  2. Client was then restarted, state sync reached 14.74% with previous rate limit errors:
    2024-10-02T14:55:05.823080201Z DEBUG diff_request_compon… | couldn't fetch diff: Inbound error: Request exceeds the maximum rate limit peer_id=12D3KooWPwV3T3fwkavenKnWxT9e6wCvoQchok8X2YSRTB5WiLng block=#5708370:MA:53611d5dee num_tries=1 max_tries=14 error=InboundRequest(ExceedsRateLimit)
    ...
    2024-10-02T14:55:24.105679088Z INFO  state_queue          | Received state sync chunk, ~14.74% complete start_key=25bb3f
joeleol commented 1 month ago

The plot goes a little deeper, please find attached the same log with an extra few hours of logging.

The final state seems to be 'couldn't fetch diff: no peers'

Maybe two separate issues?

Log: https://www.transfernow.net/dl/20241002DE3om12U (available for 7 days)