wemixarchive / go-wemix

Go implementation of the Wemix project.
https://www.wemix.com/
GNU Lesser General Public License v3.0
29 stars 25 forks source link

go-wemix using rocksdb was killed by sigsegv #119

Open egonspace opened 2 months ago

egonspace commented 2 months ago

System information

gwemix version: Gwemix/v0.10.8-stable-53273fcb/linux-amd64/go1.19.3 OS & Version: Linux Commit hash : 53273fcb3729c477c62cd0215d4c90dcd3cb5d83

Expected behaviour

Actual behaviour

An en host was killed with these logs

INFO [08-29|01:12:40.909] Imported new chain segment               blocks=1  txs=0   mgas=0.000  elapsed=32.912ms  mgasps=0.000   number=58,659,438 hash=5f317b..ee31f5 dirty=0.00B
INFO [08-29|01:12:43.156] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=4.43KiB  throttle=8192
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x1a87f38]

runtime stack:
runtime.throw({0x221464b?, 0x0?})
    runtime/panic.go:1047 +0x5d fp=0x7f79a9b14aa8 sp=0x7f79a9b14a78 pc=0x5de63d
runtime.sigpanic()
    runtime/signal_unix.go:819 +0x369 fp=0x7f79a9b14af8 sp=0x7f79a9b14aa8 pc=0x5f5069

goroutine 18768297270 [syscall]:
runtime.cgocall(0x18db27e, 0xc046f04398)
    runtime/cgocall.go:158 +0x5c fp=0xc046f04370 sp=0xc046f04338 pc=0x5aa17c
github.com/ethereum/go-ethereum/ethdb/rocksdb._Cfunc_rocksdb_write(0x7f87c5652030, 0x7f87c5624000, 0x7f8191609200, 0xc0477a1220)
    _cgo_gotypes.go:536 +0x45 fp=0xc046f04398 sp=0xc046f04370 pc=0x90b8c5
github.com/ethereum/go-ethereum/ethdb/rocksdb.(*rdbBatch).Write.func1(0x0?, 0x0?)
    github.com/ethereum/go-ethereum/ethdb/rocksdb/rocksdb.go:414 +0xd7 fp=0xc046f043e8 sp=0xc046f04398 pc=0x90fd17
github.com/ethereum/go-ethereum/ethdb/rocksdb.(*rdbBatch).Write(0x7f87c4ca8248?)
    github.com/ethereum/go-ethereum/ethdb/rocksdb/rocksdb.go:414 +0x37 fp=0xc046f04418 sp=0xc046f043e8 pc=0x90fbf7
github.com/ethereum/go-ethereum/core.(*BlockChain).writeBlockWithState(0xc0082dd400, 0xc0313a1290, {0xc002bfee48, 0x1, 0x1}, {0xc1ac10a273bc07b6?, 0x151e18f131914a?, 0x0?}, 0xc053ca04e0)
    github.com/ethereum/go-ethereum/core/blockchain.go:1236 +0x4aa fp=0xc046f046d8 sp=0xc046f04418 pc=0xb5254a
github.com/ethereum/go-ethereum/core.(*BlockChain).writeBlockAndSetHead(0xc0082dd400, 0xc0313a1290, {0xc002bfee48?, 0x1?, 0x1?}, {0x0, 0x0, 0x0}, 0xc0313a1290?, 0x0)
    github.com/ethereum/go-ethereum/core/blockchain.go:1313 +0x5e fp=0xc046f04818 sp=0xc046f046d8 pc=0xb5325e
github.com/ethereum/go-ethereum/core.(*BlockChain).insertChain(0xc0082dd400, {0xc00ce5aad8?, 0x1, 0x1}, 0x1, 0x1)
    github.com/ethereum/go-ethereum/core/blockchain.go:1679 +0x22c5 fp=0xc046f05580 sp=0xc046f04818 pc=0xb56a65
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertChain(0xc0082dd400, {0xc00ce5aad8?, 0x1, 0x1})
    github.com/ethereum/go-ethereum/core/blockchain.go:1408 +0xb51 fp=0xc046f05928 sp=0xc046f05580 pc=0xb545b1
github.com/ethereum/go-ethereum/eth.newHandler.func4({0xc00ce5aad8?, 0x1, 0x1})
    github.com/ethereum/go-ethereum/eth/handler.go:282 +0x6b9 fp=0xc046f05c50 sp=0xc046f05928 pc=0xe04a79
github.com/ethereum/go-ethereum/eth/fetcher.(*BlockFetcher).importBlocks.func1()
    github.com/ethereum/go-ethereum/eth/fetcher/block_fetcher.go:871 +0x5a9 fp=0xc046f05fe0 sp=0xc046f05c50 pc=0xdf0749
runtime.goexit()
    runtime/asm_amd64.s:1594 +0x1 fp=0xc046f05fe8 sp=0xc046f05fe0 pc=0x612de1
created by github.com/ethereum/go-ethereum/eth/fetcher.(*BlockFetcher).importBlocks
    github.com/ethereum/go-ethereum/eth/fetcher/block_fetcher.go:845 +0x3de

It seems to be an issue with rocksdb, and the version being used is v6.27.3, which is quite outdated. Since then, many bug fixes have been made, and we will monitor the situation by operating some devnet EN host with go-wemix built with rocksdb v6.28.2. If the issue occurs again, I will document the details in this issue.

jed-wemade commented 2 months ago

It seems to be killed by go-wemix or OS context beacuse the top of trace is runtime.cgocall instead of C functions in rocksdb. runtime/cgocall.go:158 points entersyscall() which is implemented as assembly.

Since go 1.19.3, the minor versions including syscall change are 1.19.7 and 1.19.9. IMHO, this issue is related to rlimit which is handled by go-wemix. (More precisely, RLIMIT_NOFILE resource) Hence, go 1.19.9 or later will resolve this issue. Also check rlimit setting for RLIMIT_NOFILE is enough before running gwemix.

egonspace commented 2 months ago

I don't think so. In all cases where the program crashes in C code, the top of the stack trace in Go will only show cgocall. To find out where exactly the crash occurred in the C code, we'll need to generate a core dump and use gdb to investigate.

If it were an rlimit issue, it wouldn't have resulted in a SIGSEGV. I think it's more likely an issue with rocksdb than with golang.

jed-wemade commented 2 months ago

runtime.SetCgoTraceback() shows C frames but gwemix does not set this option, resulting runtime.cgocall() becomes the top most frame. This issue is related to rocksdb as originally mentioned.