unexpected fault address 0x66b17040 crash

augustresende commented 2 years ago

Background

lnd crashing:

Your environment

version of lnd: v0.13.3-beta
which operating system (uname -a on *Nix) Raspbian (Debian)

Steps to reproduce

I don't know, sorry.

Oct 20 22:06:01 raspberrypi lnd[3049]: unexpected fault address 0x66b17040
Oct 20 22:06:01 raspberrypi lnd[3049]: fatal error: fault
Oct 20 22:06:01 raspberrypi lnd[3049]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x66b17040 pc=0x33b548]
Oct 20 22:06:01 raspberrypi lnd[3049]: goroutine 1 [running]:
Oct 20 22:06:01 raspberrypi lnd[3049]: runtime.throw(0xca16ce, 0x5)
Oct 20 22:06:01 raspberrypi lnd[3049]:         runtime/panic.go:1116 +0x5c fp=0x41c16b0 sp=0x41c169c pc=0x4559c
Oct 20 22:06:01 raspberrypi lnd[3049]: runtime.sigpanic()
Oct 20 22:06:01 raspberrypi lnd[3049]:         runtime/signal_unix.go:702 +0x310 fp=0x41c16c8 sp=0x41c16b0 pc=0x5d788
Oct 20 22:06:01 raspberrypi lnd[3049]: go.etcd.io/bbolt.(*DB).meta(0x309e280, 0x37dab73)
Oct 20 22:06:01 raspberrypi lnd[3049]:         go.etcd.io/bbolt@v1.3.5-0.20200615073812-232d8fc87f50/db.go:901 +0x1c fp=0x41c
Oct 20 22:06:01 raspberrypi lnd[3049]: go.etcd.io/bbolt.(*DB).hasSyncedFreelist(...)
Oct 20 22:06:01 raspberrypi lnd[3049]:         go.etcd.io/bbolt@v1.3.5-0.20200615073812-232d8fc87f50/db.go:323
Oct 20 22:06:01 raspberrypi lnd[3049]: go.etcd.io/bbolt.(*Tx).rollback(0x2d2e280)
Oct 20 22:06:01 raspberrypi lnd[3049]:         go.etcd.io/bbolt@v1.3.5-0.20200615073812-232d8fc87f50/tx.go:279 +0x68 fp=0x41c
Oct 20 22:06:01 raspberrypi lnd[3049]: go.etcd.io/bbolt.(*Tx).Commit(0x2d2e280, 0x0, 0x185ba04)
Oct 20 22:06:01 raspberrypi lnd[3049]:         go.etcd.io/bbolt@v1.3.5-0.20200615073812-232d8fc87f50/tx.go:161 +0x430 fp=0x41
Oct 20 22:06:01 raspberrypi lnd[3049]: github.com/btcsuite/btcwallet/walletdb/bdb.(*transaction).Commit(0x328a060, 0x328a060,
Oct 20 22:06:01 raspberrypi lnd[3049]:         github.com/btcsuite/btcwallet/walletdb@v1.3.3/bdb/db.go:91 +0x20 fp=0x41c17b8 
Oct 20 22:06:01 raspberrypi lnd[3049]: github.com/btcsuite/btcwallet/walletdb.Update(0xff1ac0, 0x4437c40, 0x4604040, 0x0, 0x0
Oct 20 22:06:01 raspberrypi lnd[3049]:         github.com/btcsuite/btcwallet/walletdb@v1.3.3/interface.go:275 +0x108 fp=0x41c
Oct 20 22:06:01 raspberrypi lnd[3049]: github.com/lightningnetwork/lnd/channeldb.(*DB).Update(0x4437c40, 0x4604040, 0x4437c40
Oct 20 22:06:01 raspberrypi lnd[3049]:         github.com/lightningnetwork/lnd/channeldb/db.go:193 +0x88 fp=0x41c1810 sp=0x41
Oct 20 22:06:01 raspberrypi lnd[3049]: github.com/lightningnetwork/lnd/channeldb/kvdb.Update(0xff1ac0, 0x4437c40, 0x4604040, 
Oct 20 22:06:01 raspberrypi lnd[3049]:         github.com/lightningnetwork/lnd/channeldb/kvdb/interface.go:16 +0x54 fp=0x41c1
Oct 20 22:06:01 raspberrypi lnd[3049]: github.com/lightningnetwork/lnd/channeldb.(*ChannelGraph).PruneGraph(0x338a000, 0x3ee2
Oct 20 22:06:01 raspberrypi lnd[3049]:         github.com/lightningnetwork/lnd/channeldb/graph.go:843 +0x114 fp=0x41c186c sp=
Oct 20 22:06:01 raspberrypi lnd[3049]: github.com/lightningnetwork/lnd/routing.(*ChannelRouter).syncGraphWithChain(0x2e45810,
Oct 20 22:06:01 raspberrypi lnd[3049]:         github.com/lightningnetwork/lnd/routing/router.go:725 +0x7bc fp=0x41c18fc sp=0
Oct 20 22:06:01 raspberrypi lnd[3049]: github.com/lightningnetwork/lnd/routing.(*ChannelRouter).Start(0x2e45810, 0x0, 0x0)
Oct 20 22:06:01 raspberrypi lnd[3049]:         github.com/lightningnetwork/lnd/routing/router.go:521 +0x4c8 fp=0x41c1968 sp=0

Expected behaviour

Not crash

Actual behaviour

Crashing

Roasbeef commented 2 years ago

Does this recur on restart? (unable to resume)

guggero commented 2 years ago

Are you running the 32bit version of lnd? How large is your channel.db? Try compacting the DB (db.bolt.auto-compact=true in your config, then restart lnd).

feikede commented 2 years ago

I have a similar issue on Raspbian (PI 4, 8GB), channel.db is about 1GB, has SSD on USB 3:

2021-10-30 16:41:13.953 [INF] LTND: Version: 0.13.3-beta commit=v0.13.3-beta, build=production, logging=default, debuglevel=CNCT=debug,CRTR=debug,HSWC=debug,NTFN=debug,RPCS=debug
2021-10-30 16:41:13.954 [INF] LTND: Active chain: Bitcoin (network=mainnet)
2021-10-30 16:41:13.957 [INF] RPCS: RPC server listening on 0.0.0.0:10009
2021-10-30 16:41:13.961 [INF] RPCS: gRPC proxy started at 127.0.0.1:8080
2021-10-30 16:41:13.962 [INF] LTND: Opening the main database, this might take a few minutes...
2021-10-30 16:41:13.962 [INF] LTND: Opening bbolt database, sync_freelist=true, auto_compact=true
2021-10-30 16:41:13.962 [INF] CHDB: Compacting database file at /home/btc/.lnd/data/graph/mainnet/channel.db
2021-10-30 16:41:13.962 [INF] CHDB: Found old temp DB @ /home/btc/.lnd/data/graph/mainnet/temp-dont-use.db, removing before swap
unexpected fault address 0x22aad040
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x22aad040 pc=0x5035a4]
goroutine 1 [running]:
runtime.throw(0xe1e702, 0x5)
    runtime/panic.go:1117 +0x5c fp=0x26e0ecc sp=0x26e0eb8 pc=0x47dbc
runtime.sigpanic()
    runtime/signal_unix.go:741 +0x1bc fp=0x26e0ee4 sp=0x26e0ecc pc=0x602f0
go.etcd.io/bbolt.(*DB).meta(0x24a8140, 0x37e)
    go.etcd.io/bbolt@v1.3.5-0.20200615073812-232d8fc87f50/db.go:901 +0x1c fp=0x26e0f00 sp=0x26e0ee8 pc=0x5035a4
go.etcd.io/bbolt.(*DB).hasSyncedFreelist(...)
    go.etcd.io/bbolt@v1.3.5-0.20200615073812-232d8fc87f50/db.go:323

guggero commented 2 years ago

Are you on the 64bit version of lnd? If yes, then this probably means that your database file got corrupted. If not, try running the linux-arm64 version of lnd (or simply upgrade RaspiBlitz to the latest version if this is RaspiBlitz).

feikede commented 2 years ago

@guggero I am using this image: https://github.com/lightningnetwork/lnd/releases/download/v0.13.3-beta/lnd-linux-armv7-v0.13.3-beta.tar.gz on Linux btc02 5.10.60-v7l+ #1449 SMP Wed Aug 25 15:00:44 BST 2021 armv7l GNU/Linux, plain raspian, no blitz or umbrel - it's a research node, but with some (little) real money in it.

And I have these files currently:

btc@btc02:~/.lnd/data/graph/mainnet $ ls -al
total 1113888
drwx------ 2 btc btc       4096 Nov  2 07:16 .
drwx------ 3 btc btc       4096 Oct 11 16:20 ..
-rw------- 1 btc btc 1080279040 Oct 30 16:38 channel.db
-rw------- 1 btc btc      20480 Oct 22 23:07 sphinxreplay.db
-rw-r--r-- 1 btc btc   75563008 Nov  2 07:16 temp-dont-use.db

Think it's a memory issue. Can I just delete "temp-dont-use.db" and restart lnd?

guggero commented 2 years ago

You need to switch to a 64bit operating system and use the linux-arm64 version. Otherwise you won't be able to open a DB that's more than 1 GB. You can get around this temporarily by moving the channel.db to a 64bit machine, run the compaction (with chantools for example), then move the DB file back if it's significantly smaller than 1 GB.

feikede commented 2 years ago

Ok, my second node recovery. I'd suggest not to provide lnd for 32-bit systems. Thx

lightningnetwork / lnd