moonbeam-foundation / moonbeam

An Ethereum-compatible smart contract parachain on Polkadot
https://moonbeam.network
GNU General Public License v3.0
918 stars 335 forks source link

(Critical functionality breaking!) Moonbeam emitted blocks lag imported blocks by 2-3 blocks #3040

Open Ethics3606 opened 3 days ago

Ethics3606 commented 3 days ago

There is a bug in 0.41 which causes json rpc emitted blocks to lag 2-3 actual imported blocks, causing the json rpc interface to yield outdated data. This seems to further distort the trace results of debug trace call.

For example in the picture below, the block was received from the network at 12:29:48, but the block was not emitted by newHead subscription until 12:30:06

image

How to reproduce: Subscribe to newheads, e.g.: {"jsonrpc":"2.0","id":1,"method":"eth_subscribe","params":["newHeads"]}

Observe time of block received via events and compare to output from node (journalctl -f u moonbeam).

You can also do transaction testing to prove this. If block X is emitted and you send a transaction instantly, it will not arrive at X +1 as done previously, but upto X+3. This is causing me trading losses right now.

I need to fix this ASAP. Are there any code changes I can do locally?

gonzamontiel commented 3 days ago

Hey @Ethics3606 , some time ago we got a similar report by another user, and that's why we implemented moon_getEthSyncBlockRange (https://github.com/moonbeam-foundation/moonbeam/pull/2922) . Can you try if this method returns consistently what you expect?

 curl -H "Content-Type: application/json" -d '{"id":1,"jsonrpc":"2.0","method":"moon_getEthSyncBlockRange","params":[]}' https://trace.api.moonbeam.network
Ethics3606 commented 3 days ago

Hey @Ethics3606 , some time ago we got a similar report by another user, and that's why we implemented moon_getEthSyncBlockRange (#2922) . Can you try if this method returns consistently what you expect?

curl -H "Content-Type: application/json" -d '{"id":1,"jsonrpc":"2.0","method":"moon_getEthSyncBlockRange","params":[]}' https://trace.api.moonbeam.network

Hi,

I am not sure what to make of that, but the range of hashes of my local node matches the https://trace.api.moonbeam.network endpoint.

Even so, the information in debug_trace_call certainly outdated. I know this because I do a lot of tracing for trading, on many blockchains, and the top of next block execution result is not as expected/simulated. This seems to be the case because its simulating over an old block (and there have been other trades in newer blocks taking the profit).

I am trying to dig into the code myself. What do you think is causing the long delay (24s?) from "imported block" to emitted newHead event? Are there any delay constants that can be tuned? Is it waiting for a certain finalized state before emitting blocks?

RomarQ commented 3 days ago

Hey @Ethics3606 , some time ago we got a similar report by another user, and that's why we implemented moon_getEthSyncBlockRange (#2922) . Can you try if this method returns consistently what you expect?

curl -H "Content-Type: application/json" -d '{"id":1,"jsonrpc":"2.0","method":"moon_getEthSyncBlockRange","params":[]}' https://trace.api.moonbeam.network

Hi,

I am not sure what to make of that, but the range of hashes of my local node matches the https://trace.api.moonbeam.network endpoint.

Even so, the information in debug_trace_call certainly outdated. I know this because I do a lot of tracing for trading, on many blockchains, and the top of next block execution result is not as expected/simulated. This seems to be the case because its simulating over an old block (and there have been other trades in newer blocks taking the profit).

I am trying to dig into the code myself. What do you think is causing the long delay (24s?) from "imported block" to emitted newHead event? Are there any delay constants that can be tuned? Is it waiting for a certain finalized state before emitting blocks?

Yes, we wait for the newest best_block before emitting the event, which is the latest block validated by the relay chain. We will have a look at this again and see if it can be improved.

Ethics3606 commented 3 days ago

Hey @Ethics3606 , some time ago we got a similar report by another user, and that's why we implemented moon_getEthSyncBlockRange (#2922) . Can you try if this method returns consistently what you expect?

curl -H "Content-Type: application/json" -d '{"id":1,"jsonrpc":"2.0","method":"moon_getEthSyncBlockRange","params":[]}' https://trace.api.moonbeam.network

Hi, I am not sure what to make of that, but the range of hashes of my local node matches the https://trace.api.moonbeam.network endpoint. Even so, the information in debug_trace_call certainly outdated. I know this because I do a lot of tracing for trading, on many blockchains, and the top of next block execution result is not as expected/simulated. This seems to be the case because its simulating over an old block (and there have been other trades in newer blocks taking the profit). I am trying to dig into the code myself. What do you think is causing the long delay (24s?) from "imported block" to emitted newHead event? Are there any delay constants that can be tuned? Is it waiting for a certain finalized state before emitting blocks?

Yes, we wait for the newest best_block before emitting the event, which is the latest block validated by the relay chain. We will have a look at this again and see if it can be improved.

Great!

I have just tried recompiling frontier with SyncStrategy::Parachain instead of Normal and it did not improve

Ethics3606 commented 1 day ago

Hi,

Looking at the Astar node, they are not having this problem. Imported blocks are emitted near instantly to the json rpc interface, whereas Moonbeam is now lagging up to 4 blocks!! Is there anything I can do on my end to fix this ASAP?

Ethics3606 commented 1 day ago

The block from eth_blockNumber is also lagging imported block by 3-4. So it's not just pubsub

To test: curl -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' http://127.0.0.1:9944

Ethics3606 commented 18 hours ago

The issue seems to be caused by a delay in flagging the chain tip as best block. Is it an SSD issue - delayed processing of the block?