lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.62k stars 2.07k forks source link

[bug]: LND panics when calling `kvdb.lastCompactionDate`. #8638

Open lorenzolfm opened 4 months ago

lorenzolfm commented 4 months ago

Background

LND panics when calling lastCompactionDate: lnd[5921]: panic: runtime error: index out of range [7] with length 0. This happened after a power outage that rebooted my machine. I was able to avoid the issue by setting db.bolt.auto-compact=false.

Your environment

Expected behaviour

LND should be able to startup with db.bolt.auto-compact=true, or at least fail gracefully with better error handling.

Actual behaviour

LND panics, see stacktrace:

lnd[5921]: panic: runtime error: index out of range [7] with length 0
lnd[5921]: goroutine 1 [running]:
lnd[5921]: encoding/binary.bigEndian.Uint64(...)
lnd[5921]:         encoding/binary/binary.go:179
lnd[5921]: github.com/lightningnetwork/lnd/kvdb.lastCompactionDate({0x40003b2980?, 0x0?})
lnd[5921]:         github.com/lightningnetwork/lnd/kvdb@v1.4.4/backend.go:226 +0x184
lnd[5921]: github.com/lightningnetwork/lnd/kvdb.compactAndSwap(0x4000707098)
lnd[5921]:         github.com/lightningnetwork/lnd/kvdb@v1.4.4/backend.go:127 +0xc8
lnd[5921]: github.com/lightningnetwork/lnd/kvdb.GetBoltBackend(0x4000707098)
lnd[5921]:         github.com/lightningnetwork/lnd/kvdb@v1.4.4/backend.go:101 +0x74
lnd[5921]: github.com/lightningnetwork/lnd/lncfg.(*DB).GetBackends(0x400056c740, {0x1ca9928?, 0x4000578a50}, {0x40005744e0, 0x21}, {0x4000052870, 0x29}, {0x40005745 10, 0x2e}, 0x1, ...)
lnd[5921]:         github.com/lightningnetwork/lnd/lncfg/db.go:552 +0x2be4
lnd[5921]: github.com/lightningnetwork/lnd.(*DefaultDatabaseBuilder).BuildDatabase(0x4000566990, {0x1ca9928, 0x4000578a50})
lnd[5921]:         github.com/lightningnetwork/lnd/config_builder.go:918 +0x288
lnd[5921]: github.com/lightningnetwork/lnd.Main(0x4000366000, {{0x0?, 0x40001c41e0?, 0x40001c4240?}}, 0x400007ca20, {0x40002ee1e0, 0x40001c41e0, 0x40001c4240, 0x400 01c42a0, {0x0}})
lnd[5921]:         github.com/lightningnetwork/lnd/lnd.go:427 +0x1840
lnd[5921]: main.main()
lnd[5921]:         github.com/lightningnetwork/lnd/cmd/lnd/main.go:38 +0x198
systemd[1]: lnd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
systemd[1]: lnd.service: Failed with result 'exit-code'.
hieblmi commented 4 months ago

Thanks for raising this issue. I'd recommend to keep lnd shutdown for the moment to not publish any outdated state.

hieblmi commented 4 months ago

Could you try the compaction with chantools which is described a bit here and see if that succeeds? https://github.com/lightningnetwork/lnd/issues/8532#issuecomment-1985577491

lorenzolfm commented 4 months ago

It seem to have compacted the db with no issue.

chantools compactdb channel.db --sourcedb ./channel.db --destdb ./compacted.db 2024-04-10 13:17:13.335 [INF] CHAN: chantools version v0.13.0 commit

~/tmp $ ls
channel.db  compacted.db  results
hieblmi commented 4 months ago

Ok that's great, could you check that the .last-compacted file in your live-directory is empty? If so then move it out to a different temporary place, then turn the autocompaction of your node back on and restart lnd.

bhandras commented 4 months ago

Seems like the .last-compacted file was corrupted? Are you maybe running from an SD card?

lorenzolfm commented 4 months ago

Ok that's great, could you check that the .last-compacted file in your live-directory is empty? If so then move it out to a different temporary place, then turn the autocompaction of your node back on and restart lnd.

Moving the channel.db.last-compacted still results is the same issue. The output of cat file is:

^WÄõÈy^[à<80>

Seems like the .last-compacted file was corrupted? Are you maybe running from an SD card?

No I'm using an SSD.

I'm not discarding corruption, but since moving the file and starting LND from scratch didn't work it doesn't seem that corruption is involved.

func lastCompactionDate(dbFile string) (time.Time, error) {
    zeroTime := time.Unix(0, 0)

    tsFile := fmt.Sprintf("%s%s", dbFile, LastCompactionFileNameSuffix)
    if !fileExists(tsFile) {
        return zeroTime, nil
    }

    tsBytes, err := ioutil.ReadFile(tsFile)
    if err != nil {
        return zeroTime, err
    }

    tsNano := byteOrder.Uint64(tsBytes)
    return time.Unix(0, int64(tsNano)), nil
}

If I understand correctly, if the .last-compacted file does not exists, the function returns early. Which is odd, since the file channel.db.last-compacted is created before this function even runs. Reading from an empty file could explain the crash on byteOrder.Uint64(tsBytes), tho.