lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.73k stars 2.09k forks source link

Stuck on migration step: Migrating database to properly prune edge update index #1913

Closed rytido closed 6 years ago

rytido commented 6 years ago

Background

In upgrading 0.42 to 0.5 lnd gets stuck on Migrating database to properly prune edge update index.

Your environment

Steps to reproduce

~/gocode/bin/lnd Attempting automatic RPC configuration to bitcoind Automatically obtained bitcoind's RPC credentials 2018-09-15 00:06:56.258 [INF] LTND: Version 0.5.0-beta commit=3b2c807288b1b7f 40d609533c1e96a510ac5fa6d 2018-09-15 00:06:56.258 [INF] LTND: Active chain: Bitcoin (network=mainnet) 2018-09-15 00:06:56.266 [INF] CHDB: Checking for schema update: latest_versio n=6, db_version=1 2018-09-15 00:06:56.266 [INF] CHDB: Performing database schema migration 2018-09-15 00:06:56.266 [INF] CHDB: Applying migration #2 2018-09-15 00:06:56.266 [INF] CHDB: Migrating invoice database to new time se ries format 2018-09-15 00:06:56.266 [INF] CHDB: Migration to invoice time series index co mplete! 2018-09-15 00:06:56.266 [INF] CHDB: Applying migration #3 2018-09-15 00:06:56.266 [INF] CHDB: Migrating invoice database to new outgoin g payment format 2018-09-15 00:06:56.267 [INF] CHDB: Migration to outgoing payment invoices co mplete! 2018-09-15 00:06:56.267 [INF] CHDB: Applying migration #4 2018-09-15 00:06:57.072 [INF] CHDB: Migration of edge policies complete! 2018-09-15 00:06:57.072 [INF] CHDB: Applying migration #5 2018-09-15 00:06:57.072 [INF] CHDB: Migrating database to support payment sta tuses 2018-09-15 00:06:57.072 [INF] CHDB: Marking all known circuits with status In Flight 2018-09-15 00:06:57.092 [INF] CHDB: Marking all existing payments with status Completed 2018-09-15 00:06:57.092 [INF] CHDB: Migration of payment statuses complete! 2018-09-15 00:06:57.092 [INF] CHDB: Applying migration #6 2018-09-15 00:06:57.092 [INF] CHDB: Migrating database to properly prune edge update index

Expected behaviour

Based on migration example, the noted step should proceed rather quickly.

Actual behaviour

lnd hangs on noted step for 40 min (and counting) at 100% cpu usage.

Roasbeef commented 6 years ago

If you were on a mid 0.4.2 commit, so after 0.4.2 (running master), then depending on your platform, the final migration may take some time. A bug in the earlier code caused a bunch of redundant data to remain on disk, if you see high I/O utilization, then it would mean it's busy cleaning up this extra state. What kind of hardware are you running on?

rytido commented 6 years ago

Nothing amazing, but not an SBC.

rytido commented 6 years ago

If the migration got interrupted the first time I attempted it, would that corrupt the db such that this would happen? I figured it would error if that was the case, but I don't really know. Thank you.

rytido commented 6 years ago

I do have nearly 500 Mb in my data directory, but that hasn't changed since this started. I assumed it would fluctuate during this process.

Roasbeef commented 6 years ago

Ok, if you have that much data, then then the migration itself is actually cleaning out state. The way boltdb works, is when it deletes data, it actually doesn't reclaim the space. Instead, the freed pages (it's a b+tree) go into a free list. In order to reclaim the space post-migration, you'll need to use the bolt compact tool (which is optional).

Roasbeef commented 6 years ago

Ideally if you interrupt it, since the database is ACID, all those pending changes should be rolled back.

Roasbeef commented 6 years ago

If you use something like iotop, do you see high read/write I/O?

rytido commented 6 years ago

Looks like only 20-30 K/s from lnd

rytido commented 6 years ago

Does it rely heavily on bitcoind during this process? That seems to have higher than normal I/O, and the data directory for bitcoind is on a slower external drive.

cfromknecht commented 6 years ago

@rytido the migration is applied well before connecting out to bitcoind. I would try just grabbing a beer and letting do it’s thing :) 20-30k seems low, but depends on the hardware I suppose

rytido commented 6 years ago

It ran overnight, at least 4 hours, but it did finish! Thanks for bearing with my impatience.