graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL
https://thegraph.com
Apache License 2.0
2.91k stars 971 forks source link

Subgraph using Aggregations⁠ failing randomly #5530

Closed itsjerryokolo closed 2 weeks ago

itsjerryokolo commented 4 months ago

Bug report

This subgraph fails randomly with the error below.

"Failed to transact block operations: store error: duplicate key value violates unique constraint \"transfer_stats_hour_id_key\"",

The same block 6284464 which it failed in (QmQZadt6wd8wo8U6opsVnjpAjx2Ysh3iLiPXuydJ65eqT3) is successfully processed in a new deployment (QmYktZYrbaENqm9nS8kvBxnzsRUyh4ueuQ679mYwMdNLpz)

Relevant log output

No response

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

OS information

None

george-openformat commented 3 months ago

Running into the same issue: subgraph | playground

First deployment on arbitrum-sepolia failed on block 64040656 after deploying a new version failed on block 64603832

silvercondor commented 3 months ago

getting same error on multiple subgraphs.

note that it can also happen for daily aggregations

tinypell3ts commented 3 months ago

I'm also getting the same error in my subgraph.

ERRO Subgraph failed with non-deterministic error: Failed to transact block operations: store error: duplicate key value violates unique constraint "transaction_stat_hour_id_key", retry_delay_s: 3732, attempt: 40, sgd: 5389, subgraph_id: QmZFm5y76cnjiF5TBfCCufVyR9sQAaWV6Ygm9Fg3cTUSed, component: SubgraphInstanceManager
tsudmi commented 1 month ago

Running into the same issue. Any updates on this?

gperezalba commented 1 month ago

Same issue here. Also would be nice to include a working example of the mapping in https://github.com/graphprotocol/graph-node/blob/master/docs/aggregations.md

lutter commented 1 month ago

Digging into one such example on Avalanche, this seems to be caused by blocks having the same timestamps. For example, on Avalanche blocks 51482738 and 51482739 both have the timestamp 0x67040580 which corresponds to 2024-10-07 16:00:00+00 That would trigger a rollup for the hour starting at 15:00.

I have to look through the code to see if that truly is the issue, but block times being not monotonic is a bit of a bummer

lutter commented 1 month ago

The timestamps were a red herring. What's happening is this: when a subgraph starts, we assume that the last time we did a rollup is the block time of the block where we last had an actual write/entity change by looking at the PoI table. But for subgraphs that have big gaps between actual writes, that time doesn't change even though we do rollups as new blocks come in. When a subgraph in that state is restarted, we redo those rollups which causes the unique constraint violation.

Until we have a fix, it might help to rewind the subgraph to before the last actual write. Unfortunately, that data is not available in the API anywhere, and you need to run a query like select lower(block_range) from sgdNNN.poi2$ order by vid desc limit 1 to find that block. You'll then want to rewind to a block before that one. But if the subgraph doesn't have any real writes after that block, the constraint violation will happen again when the subgraph is restarted.