paradigmxyz / reth

Modular, contributor-friendly and blazing-fast implementation of the Ethereum protocol, in Rust
https://reth.rs/
Apache License 2.0
3.78k stars 1.02k forks source link

Segmentation fault #9611

Closed 0xDmtri closed 2 weeks ago

0xDmtri commented 1 month ago

Describe the bug

Node does not start and crashes with segmentation error.

Steps to reproduce

  1. Checkout 1.0.2.
  2. Build from source with.
  3. Start the node.

Node logs

2024-07-18T13:18:02.152677Z  INFO reth_node_events::node: Canonical chain committed number=20333653 hash=0x51b686298241e3a291cddfd4603e96f570034e3f848e0dccbf56f20acde74486 elapsed=181.552955ms
2024-07-18T13:18:02.152769Z DEBUG txpool: cleaning up blob store finalized_block=20333585
2024-07-18T13:18:02.152784Z DEBUG txpool::blob: Removing blobs from disk num_blobs=0
2024-07-18T13:18:04.778903Z DEBUG net: Session established remote_addr=35.198.234.182:44298 client_version=Geth/v1.13.14-stable-ad309568/linux-amd64/go1.21.10 peer_id=0xba28eeda1b69c6abd21552180d656a99199c56c492666da907161686559ff3d2dd857920d6ba38106dcd55be2e07e3e4dc3d54ca2c2318bc8dace3c84760fdfa total_active=3 kind=incoming peer_enode=enode://ba28eeda1b69c6abd21552180d656a99199c56c492666da907161686559ff3d2dd857920d6ba38106dcd55be2e07e3e4dc3d54ca2c2318bc8dace3c84760fdfa@35.198.234.182:44298
error: reth interrupted by SIGSEGV, printing backtrace

reth(+0x20d4d76)[0x563e84b4cd76]
/lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7ff621f63050]
/lib/x86_64-linux-gnu/libc.so.6(+0x16d509)[0x7ff622094509]
reth(+0x19b116d)[0x563e8442916d]
reth(+0x23c2a46)[0x563e84e3aa46]
reth(+0x23be75a)[0x563e84e3675a]
reth(+0x21add7f)[0x563e84c25d7f]
reth(+0x2268382)[0x563e84ce0382]
reth(+0x227e0c5)[0x563e84cf60c5]
reth(+0x247f42f)[0x563e84ef742f]
reth(+0x247fb26)[0x563e84ef7b26]
reth(+0x2719aee)[0x563e85191aee]
reth(+0x2722fa4)[0x563e8519afa4]
reth(+0x270bc85)[0x563e85183c85]
reth(+0x270b9ca)[0x563e851839ca]
reth(+0x26e4f3b)[0x563e8515cf3b]
/lib/x86_64-linux-gnu/libc.so.6(+0x89134)[0x7ff621fb0134]
/lib/x86_64-linux-gnu/libc.so.6(+0x1097dc)[0x7ff6220307dc]

Segmentation fault


### Platform(s)

Linux (x86)

### What version/commit are you on?

reth Version: 1.0.2
Commit SHA: ffb44e6245eebd0144e8ae62f4f39203f2ea2e5f
Build Timestamp: 2024-07-17T22:07:44.042456947Z
Build Features: jemalloc
Build Profile: maxperf

### What database version are you on?

Current database version: 2
Local database version: 2

### Which chain / network are you on?

Mainnet

### What type of node are you running?

Archive (default)

### What prune config do you use, if any?

_No response_

### If you've built Reth from source, provide the full command you used

RUSTFLAGS="-C target-cpu=native" cargo build --profile maxperf --features jemalloc 

### Code of Conduct

- [X] I agree to follow the Code of Conduct
0xDmtri commented 1 month ago

can confirm that the issue persists with 1.0.3 too.

0xDmtri commented 1 month ago

can also confirm that I have the latest libc6.

ldd (Debian GLIBC 2.36-9+deb12u7) 2.36

mattsse commented 1 month ago

can you also reproduce this without RUSTFLAGS="-C target-cpu=native"?

m-saxemberg commented 1 month ago

The same segmentation error is present on our machine as well.

The characteristics are the same except:

Which chain / network are you on?

Sepolia

What type of node are you running?

Full (No special pruning: --full)

The same segmentation fault was also present when compiiled with the regular --release flag as a second attempt RUSTFLAGS="-C target-cpu=native" cargo build --profile release --features jemalloc,asm-keccak

0xDmtri commented 1 month ago

can you also reproduce this without RUSTFLAGS="-C target-cpu=native"?

Ok I can confirm that it works fine without it. Anything else I shall provide?

0xDmtri commented 1 month ago

@mattsse i am happy to do investigation/fixing if you could just point me to the right direction ser :)

shekhirin commented 1 month ago

@0xDmtri can you run with target-cpu=native, but without --profile maxperf? It should display a more detailed backtrace.

0xDmtri commented 1 month ago

@0xDmtri can you run with target-cpu=native, but without --profile maxperf? It should display a more detailed backtrace.

@shekhirin Lol its even weirder now, with debug profile it just works. But I also tried with release profile and it crashed with the same trace back as above. Must be some optimizations?

shekhirin commented 1 month ago

@0xDmtri can you try with --profile profiling?

0xDmtri commented 1 month ago

@shekhirin really nothing informative, i used profiling profile and set log lvl as trace and heres what i got:

2024-07-24T01:04:03.081702Z TRACE try_insert_new_payload:try_insert_validated_block{block=BlockNumHash { number: 20372987, hash: 0x7350a601616c6f88d48898d3908fd8afbce9d540abf400dde5b8b0a4234cb5db }}:try_append_canonical_chain: trie::hash_builder: updating merkle tree current=Nibbles("06040e000d") succeeding=Nibbles("06040e000e")
2024-07-24T01:04:03.081780Z TRACE try_insert_new_payload:try_insert_validated_block{block=BlockNumHash { number: 20372987, hash: 0x7350a601616c6f88d48898d3908fd8afbce9d540abf400dde5b8b0a4234cb5db }}:try_append_canonical_chain:loop{i=0 current=Nibbles("06040e000d") build_extensions=false}: trie::hash_builder: prefix lengths after comparing keys len=4 common_prefix_len=4 preceding_len=4 preceding_exists=true
2024-07-24T01:04:03.081786Z TRACE try_insert_new_payload:try_insert_validated_block{block=BlockNumHash { number: 20372987, hash: 0x7350a601616c6f88d48898d3908fd8afbce9d540abf400dde5b8b0a4234cb5db }}:try_append_canonical_chain:loop{i=0 current=Nibbles("06040e000d") build_extensions=false}: trie::hash_builder: extra_digit=13 groups=[TrieMask(0000000000111111), TrieMask(0000000000001111), TrieMask(0011111111111111), TrieMask(0000000000000000), TrieMask(0011111111111111)]
2024-07-24T01:04:03.081791Z TRACE try_insert_new_payload:try_insert_validated_block{block=BlockNumHash { number: 20372987, hash: 0x7350a601616c6f88d48898d3908fd8afbce9d540abf400dde5b8b0a4234cb5db }}:try_append_canonical_chain:loop{i=0 current=Nibbles("06040e000d") build_extensions=false}: trie::hash_builder: skipping 5 nibbles
2024-07-24T01:04:03.081796Z TRACE try_insert_new_payload:try_insert_validated_block{block=BlockNumHash { number: 20372987, hash: 0x7350a601616c6f88d48898d3908fd8afbce9d540abf400dde5b8b0a4234cb5db }}:try_append_canonical_chain:loop{i=0 current=Nibbles("06040e000d") build_extensions=false}: trie::hash_builder: short_node_key=Nibbles("")
2024-07-24T01:04:03.081799Z TRACE try_insert_new_payload:try_insert_validated_block{block=BlockNumHash { number: 20372987, hash: 0x7350a601616c6f88d48898d3908fd8afbce9d540abf400dde5b8b0a4234cb5db }}:try_append_canonical_chain:loop{i=0 current=Nibbles("06040e000d") build_extensions=false}: trie::hash_builder: pushing branch node hash hash=0xfd13804cb34d696e99841d2b47d38be87aef489f644667fa8ba2c681b01ac683Segmentation fault
0xDmtri commented 1 month ago

@shekhirin Ok its even more confusing now.

I ran Valgrind (just in case) - all good no error.

Then, I ran maxperf binary (with debug not stripped) - same crash, same non informative error.

Finally, I ran it via GDB and its not crashing (its been an hour since its running). Hello Mr. Heisenbug :)

I don't know what going on and what else I can do to figure it out.

P.S. Also tried compiling with 1.80, it still crash almost instantly (without the GDB).

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 21 days with no activity.

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been inactive for 7 days since being marked as stale.