Project to fix the graph sync

Current way it works

Events from filter

When we start the relay, it will get the list of addresses from the file addresses.json and for each address will start _start_listen_network(address): https://github.com/trustlines-protocol/relay/blob/3c7ac8e68aff8c65543e85975c4afa293a8a515d/src/relay/relay.py#L753

This will start listener on each events (trustline updates, trasnfer, balance updates, etc ...).

The listeners are greenlets that get new entries on a filter every seconds: https://github.com/trustlines-protocol/relay/blob/3c7ac8e68aff8c65543e85975c4afa293a8a515d/src/relay/blockchain/proxy.py#L142

https://github.com/trustlines-protocol/relay/blob/3c7ac8e68aff8c65543e85975c4afa293a8a515d/src/relay/blockchain/proxy.py#L26-37

The filter is a regular web3 filter that gets notified by the blockchain node (parity) when an event for the selected address and type occur.

When events are seen, they trigger changes in the graph and send push notifications to the user: https://github.com/trustlines-protocol/relay/blob/3c7ac8e68aff8c65543e85975c4afa293a8a515d/src/relay/relay.py#L807-L820

State from querying node

The problem is that filters do not handle forks, filters won't be notified in any means by the node when an event is no longer here due to a reorg for example

The way we handle that is by starting a sync process at the same time we start listening on events: https://github.com/trustlines-protocol/relay/blob/3c7ac8e68aff8c65543e85975c4afa293a8a515d/src/relay/relay.py#L757

This function will start a periodic process (by default every 5 min) that will regenerate the graph by directly querying the state of the blockchain to the node. This does not use events. https://github.com/trustlines-protocol/relay/blob/3c7ac8e68aff8c65543e85975c4afa293a8a515d/src/relay/blockchain/currency_network_proxy.py#L68

This should allow us to be "eventually" correct on the graph.

Problems

1) It can occur that while we are syncing the graph by querying the node, events come to update the graph via filters. The graph regenerated from the state will come to erase the previous graph, thus erasing the update of the event.

2) When getting events from the filter, there is no guarantee as far as I know that events are ordered in the chronological order blockchain-wise (blocknumber, logindex). Since we collect events every seconds, it could also occur that we get the older event (blockchain-wise) in the earlier second (relay time wise) and the earlier event (blockchain wise) in the later second (relay time wise), producing a wrong result.

3) We have two sources of truth in the realy: the events from the node, and the ethindex. These might disagree with each other and produce ambiguous behaviours.

4) Regenerating the graph every 5 min is probably not viable if the graph gets too big.

Potentially Easy Solutions

For problem 1) instead of recreating the whole graph and applying it all at once, we could apply it trustlines per trustlines, considerably reducing the odds that an event modify a trustlines while it is being updated. However, during the update process, the graph is a mismatch of different sources of information and might create odds results for example when someone asks for a path.
For problem 2), we can order the events we get from the filter. That does not solve the problem that events might not be ordered in between two times where we query the filter.

Hard Solutions

Idea1

Detect which events are missing or added after reorgs, (either on relay or py-eth-index) and react upon that on the graph.

Either via storing different states for each edge on the graph corresponding to the states from the unfinalised blocks + 1 finalised state.
Or via fetching the latest state of the impacted edge (-> we would still need to handle some concurrency then).

Idea2

When applying an event, store the prior/post state together with the event. When a reorg occurs and events are missing, undo the missing events backwards and apply new events forwards. Nothing is stored in the graph, only alongside the events in the data structure used to check for new / missing events.

-> potentially more encapsulated than idea1

Idea 3

Stop using filters to get events for the graph, but get them from the indexer. Since events are always describing the final state of a trustline, it is fine to reapply events multiple times as long as the last event applied is the last event chronologically in the chain.

Make a loop that queries the indexer for every ordered events in the non-finalised blocks every seconds, and apply them one by one.
Improve that by making the indexer push events to the relay instead of pulling them in a loop.
Improve again: the indexer could detect which events are new/missing. For new events, push them to the relay. For missing events, push the previous latest event describing the trustlines to the relay so that it is applied, allegedly reverting the removed event.

How to check for missing / new events:

Pull and store all latest events every 5 seconds. We have a set latest_events and latest_applied_events. We pull and store all latest event since block reorg_safe_block_nr + 1 in latest_events. We check for missing events by checking for every event in latest_applied_events if it is in latest_events. We check for added events by checking for events in latest_events but not in latest_applied_events.

We handle the added/missing events.

We update latest_applied_events by removing/adding events. Set reorg_safe_block_nr to latest_block - reorg_safety. Delete all events in latest_applied_events that we consider finalised since we will not pull them in the next iteration.

-> this mechanism could replace the current way we pull / receive events. -> This could be adapted to check for added events more often than for missing events

trustlines-protocol / relay

Notes on fixing graph sync through forks #522