Open yyforyongyu opened 1 year ago
Yeah, I've noticed this as well a while ago. I even started on some WIP code (https://github.com/guggero/lnd/tree/sync-subsystem-block-height) to see how many subsystems would be affected but then didn't have time to continue on it. But I like your name, blockbeat
!
Also love the name!!
Wondering if perhaps it may be better to go for a more async approach where each subsystem handles blocks when it is ready but to then make things more robust between subsystems. Ie, if subsystem A wants something from subsystem B then it first queries what block subsystem B is on and polls or waits till they are on the same page before continuing. Reasons for this approach would be: 1) subsystems aren't bottle necked by others. 2) startup handing is potentially easier since we dont have to worry about block beat being stopped in a weird state where one subsystem has moved on while the other is still busy with a block. 3) makes things easier in a future where potentially some subsystems are pulled out into their own binary 4) potentially makes testing easier as their is one less system dependancy.
Only hopping in on this train now so I apologise if there is context I am missing & if these questions have already been answered.
But basically in my mind each subsystem should have the equivalent of a "block curser" and should know which block it is on and should "introduce" itself with this height/hash when querying other sub-systems. I just worry about the need to ACK things in general as I think that is at its core then not a producer-consumer model?
I'll continue with review though as I assume most of these questions will be answered as I read the code :)
As of today, subsystems in
lnd
subscribe to new blocks viaRegisterBlockEpochNtfn
and process them independently. Since the speed of processing varies in subsystems, they may end up having a different view of the current best block, causing undefined behaviors. This happens more in itest than in reality, as block production is generally slow. To make sure every subsystem shares the same view, we could introduce a new service,blockbeat
, that handles producing new blocks, and other subsystems will consume the block and signal the consumption, in specific,blockbeat
will be the only service that callsRegisterBlockEpochNtfn
. Upon receiving a new block, the service will send it to all its subscribers.blockbeat
will refuse to move ahead without receiving the done signals from all its subscribers, hence making sure all subsystems are sharing the same best block.blockbeat
can further handle reorg events, thus making all other subsystems handle it.blockbeat
will behave as the heartbeat oflnd
, monitoring the subsystems to properly handle the blocks in a time-sensitive manner.