graphprotocol / graph-node

Graph Node indexes data from blockchains such as Ethereum and serves it over GraphQL
https://thegraph.com
Apache License 2.0
2.89k stars 961 forks source link

Encountering a consistent indexing error on Matic "Found no transaction for event" #2552

Closed TravelingTechGuy closed 2 years ago

TravelingTechGuy commented 3 years ago
  1. This is a bug report
  2. I'm trying to deploy my subgraph. It used the ABI of the USDT contract, and is supposed to index an entity whenever the to in the Transfer event is the address of our contract (i.e. whenever someone staked USDT in the contract).

This works well on Ethereum. On Matic (Polygon), we get the following fatal error:

"failed to process trigger: block #15167232 (0x7b2a…0d6b), transaction 4641fb604c8b6b9d515f57f364e8039fb600ed0471a7dcb7f16f2bd1314501f1: Found no transaction for event"

This is consistent, and reproducible. I tried adding a function that leaves the handler in case: the block number is 15167232, the block number is between 15167200 and 15167240 (thought a range would help), if event.transaction is empty - to no avail, this error still happens. My guess is this happens before my handler code, and therefore, not under my control.

My workaround is to add a higher startBlock number to the yaml, but the we lose tons of events.

My expected behavior is either a resolution that allows me to index the graph, or a way to bypass whatever problem triggers the fatal error.

More details:

  1. The subgraph can be found at: https://thegraph.com/explorer/subgraph/travelingtechguy/cvipolygon
  2. I can provide a link to the code, if needed. It has other entities that index well.
TravelingTechGuy commented 3 years ago

Update: I advanced the startBlock for the USDT contract past block 15167232, but it just blew up later with the same error:

failed to process trigger: block #15326592 (0x9641…dfa9), transaction 626d7374ca1862fe801fad8b57418fddd19f826b2d9797a014c5589da343985d: Found no transaction for event"
azf20 commented 3 years ago

Thanks for reporting @TravelingTechGuy - can you please share a link to the code so we can replicate? This looks like the same issue as #2353, but on a different network (Matic, not Celo) cc @tilacog and @leoyvens

TravelingTechGuy commented 3 years ago

Hi @azf20 - thanks for looking into this!

If you look at this branch - the USDT2.json is the USDT ABI I took from the Matic explorer. Change the startBlock in the yaml to a number below 15167232 and run the deployment. You can completely discard the other CVI entities.

In fact, we've now separated them from the USDT contract, as they index well, and USDT blows up.

In short, I'd say the quickest recreation steps:

  1. Create a new subgraph
  2. Grab the UST ABI from the explorer
  3. Index the Transfer event
  4. Start at startBlock close to one of the error ones.
  5. Deploy
azf20 commented 3 years ago

We have replicated this, it is possibly related to the fact that the from and to address for the transaction are both 0x0000000000000000000000000000000000000000. Looking into it. Thanks for reporting!

e.g. here https://polygonscan.com/tx/0x626d7374ca1862fe801fad8b57418fddd19f826b2d9797a014c5589da343985d

TravelingTechGuy commented 3 years ago

Thanks for being on top of it @azf20! I assume there's nothing I can do in my code to get around this, and that the error occurs prior to the handler being called?

I also wonder: a 0x to 0x transfer: it's not a burn per se - is that an indexing error, or some bot doing something?

azf20 commented 3 years ago

It is a special case on Polygon, I think it relates to their bridge - the tokens aren't transferred from 0x to 0x, but the transaction itself is a self from 0x to 0x. We have a couple more instances of this, so hopefully we'll be able to isolate the issue

TravelingTechGuy commented 3 years ago

Thanks for the update @azf20 - i assumed it was something to do with Polygon, since we haven't run into it on mainnet. In the meantime, is there anything I can do to guard against this in my code, or is it something that gets thrown before my code (handler) is even executed (i.e. I have no control of it)?

azf20 commented 3 years ago

Yes I think this is throwing before the handler is initiated - @tilacog is investigating!

azf20 commented 3 years ago

We have identified the issue, which relates to our Matic RPC not including all transactions in eth_getBlockByHash at some point, which was then cached in the database, resulting in the ongoing issue. This may relate to the new Bor requirement to specifically return bor logs. We are planning to clear the cache which should resolve the issue, as Matic RPCs are now returning the correct number of transactions for these blocks cc @leoyvens @tilacog

azf20 commented 3 years ago

We cleared the cache, which has resolved this issue for the subgraph we were testing with - @TravelingTechGuy you should be able to restart this subgraph before the problematic block (15167232), and it should index successfully. I will close this issue, but please do re-open if you see a recurrence!

azf20 commented 3 years ago

This recurred on the Decentraland subgraph:

Subgraph instance failed to run: failed to process trigger: block #16574080 (0x502d…2b95), transaction 384650eb47fd17640598f72eb3eeff356ab16a02ec3c5c33d3de818763a29081: Found no transaction for event, code: SubgraphSyncingFailure
tilacog commented 3 years ago

Since this error is solved by re-fetching the block, would it make sense to implement some retry mechanism whenever this error surfaces?

miguelmota commented 3 years ago

Started encountering this error as well and I'm not sure how to get around it. Any help is greatly appreciated

$curl -s --location --request POST https://api.thegraph.com/index-node/graphql --data-raw '{"query":"{ indexingStatusForPendingVersion(subgraphName: \"hop-protocol/hop-polygon\") { subgraph fatalError { message } nonFatalErrors {message } } }"}'
{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16573056 (0x00d5…7cad), transaction 059189fea36971157989bf3c13b17743c9313df88f317b3fa68d59793c117c03: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmWEF3CEusZDBLwkM2ijUoFH8yuFbaFZcE2EAUpr8ykxbk"}}}
leoyvens commented 3 years ago

@tilacog The question is when do we know if we need to retry. Maybe if we double check the array of transactions received against eth_getTransactionCount?

tilacog commented 3 years ago

@miguelmota We've re-fetched the matic faulty block at the hosted service. Can you please confirm if the error is gone now?

miguelmota commented 3 years ago

@tilacog A subgraph reedeployment was made but the issue occurred again towards the end of the sync

https://thegraph.com/legacy-explorer/subgraph/hop-protocol/hop-polygon?version=pending

DeepinScreenshot_select-area_20210715151904

tilacog commented 3 years ago

@miguelmota Thanks for the extra info. I've re-fetched three faulty blocks from the hosted service, and this subgraph resumed syncing.

Please let us know if this incident happens again, and we'll try to fix the problem.

miguelmota commented 3 years ago

@tilacog looks to be fixed! Thanks!

miguelmota commented 3 years ago

@tilacog There's another failure here https://thegraph.com/legacy-explorer/subgraph/hop-protocol/hop-polygon?version=pending

DeepinScreenshot_select-area_20210720203855

DeepinScreenshot_select-area_20210720203947

tilacog commented 3 years ago

@miguelmota the subgraph has resumed syncing.

Thanks for the heads up!

reddyismav commented 3 years ago

@leoyvens did you happen to try what you suggested , eth_getTransactionCount and then comparing it with the array returned from getBlockByHash. ?

reddyismav commented 3 years ago

In the meanwhile m trying to figure out why the sync issue on the bor nodes is happening and why is it aways for state sync transactions. I ll keep you guys updated here.

miguelmota commented 3 years ago

@tilacog thanks again!

nachomazzara commented 3 years ago

Could it be related to reorgs?

leoyvens commented 3 years ago

@nachomazzara we get this also for reasonably old blocks, so I don't think so.

nachomazzara commented 3 years ago

@nachomazzara we get this also for reasonably old blocks, so I don't think so.

Could the graph-node cache have the info of the reorged block and therefore couldn't find the transaction because it was reorged. In short, the tx is not there anymore

miguelmota commented 3 years ago

The polygon indexer is showing another sync failure

https://thegraph.com/legacy-explorer/subgraph/hop-protocol/hop-polygon?version=pending

DeepinScreenshot_select-area_20210726155002

Apologies for being an annoyance with the same issue; let me know if there's a way I can trigger a block resync or handle the error. Thanks!

azf20 commented 3 years ago

@nachomazzara the issue seems pretty specifically tied to 0x000 -> 0x000 transactions, rather than a more general re-org issue (if it was as you described, I think we would see it for other transactions, unless I have misunderstood your suggestion?)

Thanks for flagging @miguelmota, @tilacog can you take a look?

miguelmota commented 3 years ago

@tilacog any update on the sync failure on subgraph?

tilacog commented 3 years ago

Hey @miguelmota, I'm following that subgraph closely and re-fetching faulty blocks from our storage. As of now, it has resumed syncing, but I'll keep an eye on it.

Unfortunately, we'll have to do this manually until this issue persists.

azf20 commented 3 years ago

Hypothesis that this is caused by a race condition relating to use of block number instead of block hash for Bor transactions, which should be fixed in this PR: https://github.com/maticnetwork/bor/pull/160

miguelmota commented 3 years ago

@tilacog another block sync failure on subgraph. Thanks!

tilacog commented 3 years ago

@miguelmota subgraph resumed syncing. Thanks for letting me know ;)

miguelmota commented 3 years ago

@tilacog there's a sync error again. Sync errors seem to occur shortly after syncing is resumed

tilacog commented 3 years ago

It seems it is syncing now, but please let me know when it fails, and I'll look into it =)

miguelmota commented 3 years ago

@tilacog it was making good progress but just got stuck again recently, if you don't mind taking a look. Thanks!

miguelmota commented 3 years ago

It's been in a failed status for a few days now

DeepinScreenshot_select-area_20210817095716

$ curl -s --location --request POST https://api.thegraph.com/index-node/graphql --data-raw '{"query":"{ indexingStatusForPendingVersion(subgraphName: \"hop-protocol/hop-polygon\") { subgraph fatalError { message } nonFatalErrors {message } } }"}'
{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16697472 (0x1d17…026a), transaction 950858eabc541c9988f03dd8ceb7da15bea3a85dfea4c50be3fdc973cf82deb5: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmfDg1oHAPb7TPuy882BfBFZrJKMPNEuib98Em7rFds2nb"}}}
tilacog commented 3 years ago

@miguelmota I've just recreated the faulty block and the subgraph should be syncing now. Thanks for the heads up!

miguelmota commented 3 years ago

@tilacog 🙏 looks like it synced ~800 blocks but got stuck again on block #16698304 😁

{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16698304 (0x9135…eab9), transaction ee3110749ed21ab4102d1460ce65417a4cd75ac4aaf2b85de3477f5cc9376f76: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmfDg1oHAPb7TPuy882BfBFZrJKMPNEuib98Em7rFds2nb"}}}
tilacog commented 3 years ago

@miguelmota thanks for letting me know! It is back to sync now.

miguelmota commented 3 years ago

@tilacog It seems to fail to sync the block every few thousand blocks and the subgraph still has 1-2M blocks to index to catch up to head.

{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16710656 (0xc5c3…6a47), transaction 78e1ceac8041c2046f732c9ed6680fc6e5ed96c6d351f873945e2af8d2c7d67a: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmfDg1oHAPb7TPuy882BfBFZrJKMPNEuib98Em7rFds2nb"}}}

Any way to automate the re-sync of the block?

We've been trying to fully sync this subgraph for some time now 😅

TravelingTechGuy commented 3 years ago

Hi! Original Bug Opener™ here 😀 Encountering the same bug with this subgraph: https://thegraph.com/legacy-explorer/subgraph/vladi-coti/cvi-polygon-usdc.

Will re-indexing help, or is this a systemic issue?

Thanks! @azf20

azf20 commented 3 years ago

Hey! I think @tilacog should be able to help here

miguelmota commented 3 years ago

Sync error - block #16735872

{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16735872 (0xbd03…dbf3), transaction e37d065810059bbba193706ee2f09f7751949f260b13ee63079efbafe09cdf01: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmfDg1oHAPb7TPuy882BfBFZrJKMPNEuib98Em7rFds2nb"}}}
shanefontaine commented 3 years ago

Hey @tilacog looks like we are still seeing the sync failure on Block 16,761,151.

Can we expect this to work consistently in the future or is there something that will fundamentally block this sync going forward?

tilacog commented 3 years ago

@shanefontaine this will definitely work consistently in the future.

It is still unclear why blocks appear without those transactions at ingestion time, but that error is being actively studied by our and our providers' teams.

I've refetched that block for your subgraph, so It should get back to sync soon.

shanefontaine commented 3 years ago

Great to know! Thank you @tilacog

miguelmota commented 3 years ago

Sync error - block: #16766208

{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16766208 (0xcf8f…3600), transaction 448b929a92f44df4f7e28de7144be8229485f27fa7f89c0ef165885d5f78d133: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmfDg1oHAPb7TPuy882BfBFZrJKMPNEuib98Em7rFds2nb"}}}

@tilacog

miguelmota commented 3 years ago

Sync error - block #16776192

{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16776192 (0x072c…c74c), transaction e2897b360222e76480388b4c55a020250717910c87c175a4e00c7230758ab4ce: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmfDg1oHAPb7TPuy882BfBFZrJKMPNEuib98Em7rFds2nb"}}}

@tilacog

miguelmota commented 3 years ago

Sync error - block #16785216

{"data":{"indexingStatusForPendingVersion":{"fatalError":{"message":"failed to process trigger: block #16785216 (0xdc6c…85c5), transaction 8d63b25cbc8e3fa4a62b4513d70151f8485c1fe1d5da044a83789a47219a6b55: Found no transaction for event"},"nonFatalErrors":[],"subgraph":"QmfDg1oHAPb7TPuy882BfBFZrJKMPNEuib98Em7rFds2nb"}}}

@tilacog