Block Triggers hash write-up

evaporei commented 2 years ago

One of The Graph's coolest features is that queries are deterministic, and given a Qm subgraph hash, indexing it should always give the same result. This is possible because we inherit the blockchain's determinism property, however there's a big loophole which can break this amazing feature, which is the chain provider.

Currently the main (or only) type of connection we give as option to indexers (in The Graph Network) is the JSON-RPC one. To use it, they can either run a node themselves or use a third party service like Alchemy. Either way the provider can be faulty and give incorrect results for a number of different reasons.

To be a little more specific, let's say there are indexers/nodes A and B. Both are indexing subgraph Z. Indexer A is using Alchemy and B is using Infura.

Given a block 14_722_714 of a determined hash, both providers will very likely give the same result for these two values (block number and hash), however other fields such as gas_used or total_difficulty could be incorrect. And yes, ideally they would always be correct since they are chain providers, that's their main job, however what I'm describing is the exact issue we've faced when testing indexing Ethereum mainnet with the Firehose.

These field/value differences between providers are directly fed into the subgraph mappings, which are the current input of the POI algorithm and the base of The Graph's determinism property. Not taking the possible faultyness of the chain providers into account, can break determinism altogether.

And the biggest problem today is that, to spot these POI differences, we have to index subgraphs that use those values in their mappings. If by any chance in Firehose shootout we've done in the integration cluster, there were no subgraphs using these values we wouldn't spot any POI differences, which is a very severe issue.

POI differences described in the Firehose shootout for reference: https://gist.github.com/evaporei/660e57d95e6140ca877f338426cea200.

So in summary, the problems being described above are:

That currently we consider the chain provider as a source of truth, which can only be questioned in behalf of re-orgs;
We don't have a good way to compare provider input (that could spot POI differences) without the indirection of a subgraph mapping.

That3Percent commented 2 years ago

Thanks for the write-up, this is a clear explanation of a source of indexing-time determinism issues in graph-node.

Initial thoughts: Yes, we can detect changes by hashing raw data from Ethereum nodes in the PoI. That method can only detect discrepancies when multiple Indexers use different providers from the same subgraph though. It cannot prevent the discrepancies from happening or identify which branch is correct.

To the maximum extent possible, we should consider using Ethereum verifiable queries. The Ethereum RPC API has a separate call to provide Merkle Proofs. The last time I looked into it, it was hard to get information about how to use that API to verify the RPC calls, but there were some libraries to help. There are tradeoffs to consider, performance especially, but verifiable queries could allow us to prevent determinism bugs instead of just detecting them if we throw a non-deterministic error for invalid proofs.

Looking at the RFC now, but interested in hearing your thoughts on the above @evaporei.

lutter commented 2 years ago

I just commented in more detail on the RFC; the upshot of my comments is: if this is meant mostly to identify discrepancies between providers, we should think about an indexing mode where graph-node consults multiple providers and directly compares their responses. It seems to me, that the information gathered there is mostly useful for testing of graph-node and/or for qualifying providers; in production, I don't think that discrepancies between providers that do not lead to PoI differences are all that important.

That3Percent commented 2 years ago

@lutter

I don't think that discrepancies between providers that do not lead to PoI differences are all that important

This is an interesting point.

Taking this idea to it's logical conclusion, we could separate the PoI from something like a "determinism trace" that was a kitchen sink of militant determinism tripwires. The trace in this RFC could be the first of several things to include in the hash over time. This trace could be used when doing integration testing to prevent determinism bugs from being deployed, but also minimize the impact to production for potential determinism issues that were not surfaced by actual subgraphs.

The seeming tradeoff is that if the data isn't rolled into the PoI then we are less likely to actually catch the determinism bugs unless indexers are proactively communicating off-chain. This is probably okay since gossip networks are being developed independently.

evaporei commented 2 years ago

As mentioned here, I'll pause this for now and come back later absorbing into the RFC what you folks commented 🙂

github-actions[bot] commented 1 year ago

Looks like this issue has been open for 6 months with no activity. Is it still relevant? If not, please remember to close it.

graphprotocol / graph-node

Block Triggers hash write-up #3554