hyperledger / firefly

Hyperledger FireFly is the first open source Supernode: a complete stack for enterprises to build and scale secure Web3 applications. The FireFly API for digital assets, data flows, and blockchain transactions makes it radically faster to build production-ready apps on popular chains and protocols.
https://hyperledger.github.io/firefly
Apache License 2.0
485 stars 202 forks source link

Migration to v1.3.0 can cause missed events when contract listeners are listening from latest #1531

Open EnriqueL8 opened 2 weeks ago

EnriqueL8 commented 2 weeks ago

As part of the v1.3.0 release, we rearchitected the way event streams from the transaction managers are configured against the namespaces in FireFly. FireFly core will now create one event stream per namespace and as such we wrote code to migrate the existing contract listeners on the previous event stream to the corresponding namespace event stream.

Contract listeners have an option of where to listen from where the choices are accepted:

As such when the listeners get recreated as part of the migration, the ones set to latest will now listen from a new latest and in the time where the chain has advance and FireFly is starting up some event could get lost! This can get worse with another bug where if FireFly cannot communicate with the Transaction Manager to create the listener it will give up and never listen to those events!

A way to solve this would be:

This was we do not lose events.

Looking at the code, this change is tricker than it sounds because the original change simply at startup creates an event stream and apply the listeners created for the namespace. There is no notion of "migration" so this code will need to introduce some code to get the existing listeners on the "global" event stream and hopefully be able to using some identifier correlate them to the new ones that need to be created.

peterbroadhurst commented 1 week ago

Concrete proposal - which has other benefits:

peterbroadhurst commented 1 week ago

The listener endpoint on FFTM already has the checkpoint information embedded, so it doesn't look like changes would be needed to the FFTM or EVMConnect codebases for Core to be able to query the checkpoint information:

https://github.com/hyperledger/firefly-transaction-manager/blob/1ac00b5eb37664d73abebd8822ae8439cef3f4bc/pkg/fftm/stream_management.go#L376-L385

peterbroadhurst commented 1 week ago

This code confirms existence of custom listeners. Per the comment it's not specific to the migration scenario: https://github.com/hyperledger/firefly/blob/16aec0b55471c2950e2d95ff90972ca0455c369d/internal/contracts/manager.go#L156-L161

This calls down to ethereum specific code, which gets the subscription in the context of the e.streamID that is designated to the namespace: https://github.com/hyperledger/firefly/blob/16aec0b55471c2950e2d95ff90972ca0455c369d/internal/blockchain/ethereum/ethereum.go#L909-L913

So from this I believe that the code for migration is rather leaning on simply the recreation code that was introduced (during the 1.2->1.3 development) for a different reason - the recreation of listeners in EVMConnect if they had been deleted.

Need to dig into what happens to the old listeners that were at a global level 👁️

peterbroadhurst commented 1 week ago

Reading through https://github.com/hyperledger/firefly/pull/1388 and the code, it seems like the original event streams are just left and ignored. However, they are not referred to in any way currently. So implementing a read of their checkpoint would be complicated.

So I'm going to investigate locating/decoding/passing the last-event information.

peterbroadhurst commented 1 week ago

For MultiParty networks (where used) the FireFly batch pin contract follows the same pattern, and there's even less code change as the namespace was included in the listener name already in 1.2: https://github.com/hyperledger/firefly/blob/16aec0b55471c2950e2d95ff90972ca0455c369d/internal/blockchain/ethereum/ethereum.go#L285-L289

peterbroadhurst commented 1 week ago

The thing that distinguishes multi-party BatchPin events from all other events in the blockchainevents table, is the lack of a listener_id.

peterbroadhurst commented 1 week ago

For Tokens, there is a strong leaning to using either 0 or a specific block number in the code. When a user doesn't specify a block number, it goes to 0 in the token connector before being passed to EVMConnect. There are many cases we see specific block numbers being used to designate the contract installation block for a token. However, currently I don't see any likelihood that token connectors would have been coerced with a blockNumber: "latest".

So I am not going to work on addressing that, as it would mean a change to the /activatepool flow (which I believe gets now called on every re-connect of a namespace to a token connector, against every pool) to look up the last event and provide it as extra information to handle only the latest special case.

I will look to discuss this a little further with @awrichar

peterbroadhurst commented 1 week ago

@EnriqueL8 @awrichar - I have submitted PR https://github.com/hyperledger/firefly/pull/1534

Open conversations I'd like your view on this week:

EnriqueL8 commented 1 week ago

Thanks for all the thinking!

peterbroadhurst commented 1 week ago

So just thinking about the implications, it will simply be that the listener in EVMConnect/fabconnect will show a fromBlock that isn't the only you specified on the listener itself.

I think that's justifiable for the benefit you get - should roll up to the release notes for 1.3.1 for sure, and maybe we need a tweak to the reference section that talks about the event bus, to cover this.

Happy to take both those items on.

peterbroadhurst commented 1 week ago

I've updated the PR to handle the 0 and explicit block cases for Fabric and Ethereum

EnriqueL8 commented 1 week ago

Just added the documentation tag to make sure we document this new behaviour and add it to the release notes

EnriqueL8 commented 1 week ago

So just thinking about the implications, it will simply be that the listener in EVMConnect/fabconnect will show a fromBlock that isn't the only you specified on the listener itself.

Agreed, we should just document that as part of upgrade your listeners will get recreated and as part of this it will take the new "latest" block so it does not loose events. We query the checkpoint information from the connector when retrieving a listener so users will see it straight away