hackbg / undexer

🎉 The Undexer 🎉 Namada network indexer powering https://shielded.live/. See also: https://github.com/hackbg/undexer-node
5 stars 4 forks source link

Indexer critical error #5

Closed opsecx closed 3 months ago

opsecx commented 3 months ago

Hey, it's me again.

So the indexer halted and got this error:

NamadaChainQuerying block height ✔ Epoch=> Fetching epoch for block 40730 ⋯ https://(rpc url)/ ABCI query: /shell/ep och_at_height/40730 file:///home/(user)/undexer/fadroma/packages/namada/ pkg/fadroma_namada.js:682 const ret = new Error(getStringFromWasm0(arg 0, arg1)); ^

Error: response error

Caused by: Internal error: could not find results for heigh t #41001 (code: -32603)

Location: /home/(user)/.cargo/registry/src/index.crates.io-6f17d22bba15001f/flex-error-0.4.4/src/tracer_impl/eyre.rs:10:9 at file:///home/(user)/undexer/fadroma/packages/namada/pkg/fadroma_namada.js:682:21 at logError (file:///home/(user)/undexer/fadroma/packages/namada/pkg/fadroma_namada.js:121:18) at imports.wbg.__wbg_new_28c511d9baebfa89 (file:///home/(user)/undexer/fadroma/packages/namada/pkg/fadroma_namada.js:681:66) at fadroma_namada.wasm.js_sys::Error::new::h2a8d186ef64a0e6c (wasm://wasm/fadroma_namada.wasm-00780832:wasm-function[2715]:0x17a3e3) at fadroma_namada.wasm.fadroma_namada::decode::Decode::block::hd0898406e25d1048 (wasm://wasm/fadroma_namada.wasm-00780832:wasm-function[34]:0x25ebe) at fadroma_namada.wasm.decode_block (wasm://wasm/fadroma_namada.wasm-00780832:wasm-function[1635]:0x169d19) at Decode.block (file:///home/(user)/undexer/fadroma/packages/namada/pkg/fadroma_namada.js:395:18) at NamadaBlock.fromResponses (file:///home/(user)/undexer/fadroma/packages/namada/NamadaBlock.ts:39:51) at NamadaBlock.fetchByHeight (file:///home/(user)/undexer/fadroma/packages/namada/NamadaBlock.ts:30:17) at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v20.15.0

opsecx commented 3 months ago

Does the indexer halt when it reaches latest block height?

opsecx commented 3 months ago

Also, now I started it again, and it starts completely from scratch?

egasimus commented 3 months ago

No, it should not be starting from scratch, neither is it expected to crash when reaching latest block height - it should poll for a new block until one appears. We'll look into this - it's possible it's caused by the 0.41 upgrade, as it was not happening previously.

opsecx commented 3 months ago

No, it should not be starting from scratch, neither is it expected to crash when reaching latest block height - it should poll for a new block until one appears. We'll look into this - it's possible it's caused by the 0.41 upgrade, as it was not happening previously.

Ok, please let me know if you are able to reproduce.

opsecx commented 3 months ago

I am more concerned by the fact it starts from scratch than it crashing at latest block.

opsecx commented 3 months ago

haven't looked at the data tables yet re if data is retained.

egasimus commented 3 months ago

Just added the command ./undexer db status. It prints row counts, you can use it to check if data is retained

egasimus commented 3 months ago

https://github.com/hackbg/undexer/blob/main/main.ts#L13

opsecx commented 3 months ago

Any idea why this is happening?

opsecx commented 3 months ago

./undexer db status ⏳ Starting @hackbg/undexer 2.0.0... ⏳ Compiling TypeScript... ⌛ Compiled TypeScript in 0.041s Rows in DB: Blocks: 45189 Transactions: 119 Validators: 17 Proposals: 0 Votes: 0

opsecx commented 3 months ago

Still going through all blocks from scratch when I start the indexer again (though rather quickly)

egasimus commented 3 months ago

That makes sense. There are two tasks that iterate over block numbers: one fetches the majority of the block data, the other fetches just the epoch number associated with each block. Epoch numbers live on a separate RPC endpoint of the node, and we started tracking those relatively recently - so they ended up being implemented as separate loops.

My current thoughts on the crash you originally reported: sometimes these happen if the node fails to respond correctly on time. So we fail fast, and count on the container to be restarted (in production, systemd has been handling this well enough for us). You can try running the indexer with something like while true; do ./undexer index; sleep 5; done which should restart it. If it keeps crashing in the same way, report back :slightly_smiling_face:

opsecx commented 3 months ago

it keeps crashing. and it seems to crash on final block. and then it starts from scratch.

egasimus commented 3 months ago

Closing; discussion continues in #8