Add API endpoint to replay events emitted

CharlieC3 commented 3 years ago

Is your feature request related to a problem? Please describe. Currently the stacks-node is only able to emit events live as they happen. This poses a problem in the scenario where the stacks-blockchain-api needs to be upgraded and its database cannot be migrated to a new schema.

The only way to perform this upgrade is to wipe the stacks-blockchain-api's database. However, the stacks-blockchain-api is not able to request or query for the old events from previous blocks in this scenario, so the stacks-node also has to be restarted from an empty state in order to emit events from past blocks as it syncs with the network.

This poses a challenge when hosting the API and followers as private or public infrastructure because depending on the stacks-blockchain-api upgrade being performed, special out-of-band procedures may be required to wipe any followers emitting events at the same time the API is being upgraded. This makes the job of hosting these services more difficult to manage and error-prone.

Describe the solution you'd like Ideally the stacks-node would have an endpoint that can be utilized to "replay" past events. With this solution, the stacks-blockchain-api would be able to use a fully-synced follower when starting with an empty database, and instruct the follower to emit past events to all configured stacks-blockchain-api services in its config.toml.

jcnelson commented 3 years ago

The Stacks node deliberately does not store events, because the act of doing so leads to abuse by smart contract developers [1]. Storing events is very much not the Stacks node's responsibility.

It sounds like the Stacks API node needs a separate way to log and store the events given to it, so it can then replay them as part of a database migration / re-import. Why not just create a "log-and-forward" API proxy that puts all the raw events into a single time-series table, and use that table to execute reprocessing? Barring that, you can simply spin up a new Stacks node and have the new Stacks node replay the events as part of reprocessing the blockchain.

[1] https://medium.com/coinmonks/store-data-in-ethereum-by-logging-to-reduce-gas-cost-b70a13884485

CharlieC3 commented 3 years ago

@jcnelson Is there any way the stacks-node would be able to re-create past events using the data it already stores? I'm not suggesting the stacks-node stores any new event data it doesn't already, but I was hoping there'd be a way for the stacks-node to flip through the data it already has persisted to re-create the past events. If that's not possible without exposing it to the problem you linked, then we can look into alternatives.

jcnelson commented 3 years ago

The act of re-creating past events is the act of re-processing the blockchain.

CharlieC3 commented 3 years ago

Gotcha, I think that's what would need to be done then. Assuming the endpoint for executing that function is protected behind something like a config property which enables/disables it, can you forsee any issue in doing that? Happy to bring this up in next week's blockchain meeting as well.

I would think it could reprocess the data it already has rather quickly. I know there are other solutions which would likely be quicker (like the one you mentioned), but if reprocessing the blockchain is fast enough I feel this could be a good compromise of speed and simplicity. I'm trying to keep complexity to a minimum if possible so others can more easily host these services on their own.

CharlieC3 commented 3 years ago

@zone117x Feel free to chime in here as well.

zone117x commented 3 years ago

The act of re-creating past events is the act of re-processing the blockchain.

That's what I think would be needed for this feature request. It would be pretty handy for API development and testing as well, assuming it would be much quicker than a regular network sync.

jcnelson commented 3 years ago

A couple things:

The block-processing is slower than the network sync. Downloading blocks is pretty fast.
Trying to get the node to re-process blocks without also either corrupting the chainstate or opening up a resource exhaustion DDoS vulnerability is a very tall order. The code is written with the assumption that a block will be processed at most once.

I really think you should explore updating the event observer so it simply remembers each block's events, and gives you the ability to replay them. It'll be much faster, far less invasive, and easier to test.

jcnelson commented 3 years ago

If I were to do this, I'd create an "event pipe" that simply consumed, logged, and forwarded events to the API side-car, as follows:

---------------                --------------                ------------
| Stacks node | ---(events)--> | Event pipe | ---(events)--> | API node |
---------------                -------------- <--(replay)--- ------------
      |                              |                            |
.------------.                 .------------.                .-----------.
| chainstate |                 | event log  |                |  API DB   |
*------------*                 *------------*                *-----------*

The "event pipe" would be a separate process altogether that simply stored all events it received to a replay log, and forwarded them on to downstream consumers (i.e. the API node). The event pipe would also provide a GET interface for the events from a block, which the API node would query via a "replay" path in order to re-process them.

CharlieC3 commented 3 years ago

Thanks for the response Jude. It doesn't sound like this is a feasible approach. I'll sync up with Matt offline to talk about different ways we can solve this.

zone117x commented 3 years ago

Update: this issue was making microblock testing against mainnet/testnet painful, and in some ways not possible. For example, the development and testing cycle around how the stacks-node and API behave at a fully synced state. If any event-emitter handling or SQL schema/data changes were needed, it would easily require wiping to a clean state then waiting over a day for a fully re-sync.

I ended up implementing something similar to the "event pipe" @jcnelson described, see https://github.com/blockstack/stacks-blockchain-api/pull/611

blockstack-devops commented 4 days ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

stacks-network / stacks-core

Add API endpoint to replay events emitted #2604