[Bug] Subgraph panicked with message - MemoryAccessError

eonwarped commented 9 months ago

Bug report

Subgraph failed with non-deterministic error: failed to process trigger: block #51796283 (0xc9f9…e77e), transaction 5dc807c6ea6588c53ba17d1bbd3956ab0dd89fd9d6f08740a40734d40b17c2bc: Subgraph panicked with message: called Result::unwrap() on an Err value: MemoryAccessError { _private: () }, retry_delay_s: 4236, attempt: 41

Initially we thought it had to do with not updating and reindexing after 'adding items', however it does not seem to be the case, as we resynced after adding them and ran into issues in the exact same place.

From logs below it seems that a particularly suspicious processing that takes a lot of time and is saving a lot of entities might be related to this, are we hitting some kind of limitation on per-block processing?

Relevant log output

https://thegraph.com/hosted-service/subgraph/otterclam/otto?selected=logs

Subgraph failed with non-deterministic error: failed to process trigger: block #51796283 (0xc9f9…e77e), transaction 5dc807c6ea6588c53ba17d1bbd3956ab0dd89fd9d6f08740a40734d40b17c2bc: Subgraph panicked with message: called `Result::unwrap()` on an `Err` value: MemoryAccessError { _private: () }, retry_delay_s: 4236, attempt: 41

prior to that, previous log entry shows:

Done processing trigger, gas_used: 0, data_source: Otto, handler: handleTraitsChanged, total_ms: 114900, transaction: 0x5dc8…c2bc, address: 0x6e8a…8eb7, signature: TraitsChanged(indexed uint256,uint16[16])

IPFS hash

No response

Subgraph name or link to explorer

otterclam/otto

Some information to help us out

[ ] Tick this box if this bug is caused by a regression found in the latest release.
[X] Tick this box if this bug is specific to the hosted service.
[X] I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

eonwarped commented 9 months ago

I've been debugging by adding some logging and grafting to the failing subgraph. Some things of note:

With a code change bypassing a giant loop that recomputes and saves all entities, I can get past this error.

I made two log changes to try to figure out which entity it fails at, and in two separate runs it failed at a different entity, which is interesting, so seemingly it is a nondeterministic issue (although for a given graph version it stops at the same point when retrying, also interesting).

paymog commented 5 months ago

I'm also seeing this issue. Any ideas here @leoyvens @lutter? I'm running v0.34.0

eonwarped commented 5 months ago

In our case it's application specific. I removed this error by reducing the amount of persists within a maintenance call that got triggered, it was looping through all entities and re-saving them, and I think the subgraph was not able to handle it.

lutter commented 5 months ago

It seems that a MemoryAccessError comes from deep within wasmtime. Since this seems to be related to loading a lot of entities, I wonder if it just means that the mappings are running out of memory. Do you have a rough idea how many entities the mappings were loading when this happened?

paymog commented 5 months ago

I believe around 30k entities were being saved in a loop in the subgraph that ran into issues on our end

graphprotocol / graph-node