keep-network / tbtc

Trustlessly tokenized Bitcoin on Ethereum ;)
https://tbtc.network
MIT License
214 stars 45 forks source link

Relay maintainer: Customizable headers batch size and dependencies upgrade #838

Closed lukasz-zimnoch closed 2 years ago

lukasz-zimnoch commented 2 years ago

On BTC mainnet, difficulty changes every 2016 blocks. That means there are long series of 2016 blocks with same difficulty and nothing unexpected happens here. But, BTC testnet has the "20 minutes rule" which says: "if a block has not been mined within 20 minutes, drop the difficulty to 1". This makes difficulty much more volatile and can cause a situation when each block in series has a different difficulty.

The relay maintainer observes each block and relays it to the Ethereum contract but updates the currentEpochDiff every 5 blocks (gas optimization). The currentEpochDiff field is used by TBTC to validate proofs and this is the code which throws not at current or previous difficulty errors we often observe in our e2e tests. The relay maintainer can also manage the situation when mainnet difficulty changes as the moment is predictable (block number modulo 2016) and there is a corner case handle for that. The fact it updates difficulty every 5 blocks is not an issue here. But, if the difficulty changes unpredictably like on testnet, the relay maintainer can turn out of sync for a longer period of time. That usually happens when there are multiple difficulty changes in a short period of time. Of course, the relay maintainer recovers eventually but the out of sync periods are the pain here and cause problems we observe.

This PR makes the currentEpochDiff update threshold configurable. We can leave 5 blocks for mainnet but we need to update every block on testnet. This will cause a higher testnet ETH consumption but this is the price we must pay.

This changeset also updates go-ethereum and keep-common dependency in order to work with EIP-1559 in a proper way.

michalinacienciala commented 2 years ago

I'm closing the PR. Changes were tested on the Ropsten environment (with relay maintainer pod updated & rotated) and did not improve the ratio of successful vs unsuccessful E2E tests / Testnet runs. Both before and after the changes around 60% of jobs were failing with either Error: execution reverted: not at current or previous difficulty or the same error, but also UnhandledPromiseRejectionWarning: Unhandled promise rejection, after which execution was hanging and workflow was being aborted after 6 hours due to a timeout.