Thoughts on architecture for real-time Scanner integration

The scanner currently processes the blockchain from the tip backward until it hits the block that includes inscription 0. Since the scanner works with the raw blk and rev files bitcoin core needs to be turned off while it's running to prevent a race condition. Of course, this is not sufficient for operating a real-time explorer.

TODOs:

The processing of a single block needs to be refactored out into its own function. This should be pretty easy to do, take a look at https://github.com/ordilabs/bitcoin-scanner/pull/12 where I extracted that code out into the worker as a starting point.
The raw file from disc processing should still be maintained and just use that function so that it can be used for spinning up new servers/services etc.
To continuously follow the latest blocks I would recommend using the zmq interface. When the scanner is integrated with that interface, it will be notified of new blocks coming in instantly and can process them right away. The only reasonable alternative would be long-polling the RPC interface but that would still mean that blocks would be processed slower than on other explorers and that doesn't sound great.
I am a bit unclear whether there is still an RPC integration needed as well. For example, raw file from disc sync is finished, the scanner stops. The bitcoin core node starts again. Once the bitcoin core node is started the scanner connects to the ZMQ interface. I am unsure whether this can be made robust enough that it can be sure that there is no block missed with this setup alone. I.e. maybe the node started already processing some blocks before the ZMQ connection to the scanner was finished. There doesn't seem to be a way to block core from downloading blocks before the ZMQ connection is established. So probably the scanner also needs the ability to request missed blocks via the RPC interface.
To accomplish the former but also the next item in the list, the scanner should be aware of which blocks it has scanned already. The inscriptions table has a genesis_height but that could be deceiving since there can also be blocks without an inscription.
The scanner should also be made reorg-robust. The seemingly simplest way to accomplish this should be: When a reorg happens just delete all the inscriptions that were in the reorged-out block (should be easy via genesis_height) and process the newly included block. Anything more complex seems unnecessary since reorgs are rare and it would maybe save 1-2 seconds at best.

ordilabs / live

Thoughts on architecture for real-time Scanner integration #98