lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.67k stars 2.07k forks source link

LND v0.10.2-beta.rc2 restarts on 32bit ARM Debian based OS without any obvious error in the log #4404

Closed openoms closed 4 years ago

openoms commented 4 years ago

Background

Updating to LND v0.10.2-beta.rc2 causes random restarts. Downgrading to LND v0.10.1-beta solves the issue and runs stable.

Your environment

Steps to reproduce

update lnd to the latest release installed the binary

Expected behaviour

Expected to function without restarts

Actual behaviour

LND restarts after a few minutes without any obvious reason no fails or errors I could find in the logs. Last 2000 lines from lnd.log from two separate occasions are here: https://termbin.com/6t6t https://termbin.com/hrmpp

Please tell how I can help debug further.

guggero commented 4 years ago

Not good... It looks like it might panic. Unfortunately the stack trace is not logged to the log file. Do you have a systemd log that might contain the stack trace?

openoms commented 4 years ago

Don't have much in journalctl:

$ sudo journalctl -u lnd
-- Logs begin at Tue 2020-06-23 08:09:25 BST, end at Tue 2020-06-23 09:27:18 BST. --
Jun 23 08:12:21 raspberrypi systemd[1]: Starting LND Lightning Daemon...
Jun 23 08:12:21 raspberrypi systemd[1]: Started LND Lightning Daemon.
Jun 23 08:20:22 raspberrypi systemd[1]: lnd.service: Main process exited, code=killed, status=11/SEGV
Jun 23 08:20:22 raspberrypi systemd[1]: lnd.service: Failed with result 'signal'.
Jun 23 08:21:22 raspberrypi systemd[1]: lnd.service: Service RestartSec=1min expired, scheduling restart.
Jun 23 08:21:22 raspberrypi systemd[1]: lnd.service: Scheduled restart job, restart counter is at 1.
Jun 23 08:21:22 raspberrypi systemd[1]: Stopped LND Lightning Daemon.
Jun 23 08:21:22 raspberrypi systemd[1]: Starting LND Lightning Daemon...
Jun 23 08:21:22 raspberrypi systemd[1]: Started LND Lightning Daemon.
Jun 23 08:29:18 raspberrypi systemd[1]: lnd.service: Main process exited, code=killed, status=11/SEGV
Jun 23 08:29:18 raspberrypi systemd[1]: lnd.service: Failed with result 'signal'.
Jun 23 08:30:18 raspberrypi systemd[1]: lnd.service: Service RestartSec=1min expired, scheduling restart.
Jun 23 08:30:18 raspberrypi systemd[1]: lnd.service: Scheduled restart job, restart counter is at 2.
Jun 23 08:30:18 raspberrypi systemd[1]: Stopped LND Lightning Daemon.
Jun 23 08:30:18 raspberrypi systemd[1]: Starting LND Lightning Daemon...
Jun 23 08:30:18 raspberrypi systemd[1]: Started LND Lightning Daemon.

now monitoring with strace:

$ pidof lnd
26837

$ sudo strace -p 26837 -v
strace: Process 26837 attached
futex(0x179ab00, FUTEX_WAIT_PRIVATE, 0, NULL
openoms commented 4 years ago

Interestingly it did not fail since the last (number 3or 4) restart. Will keep en eye and report if happens again.

guggero commented 4 years ago

Oh, maybe that binary was built with go1.14 but not all required fixes were included. Can you run lncli version and tell me what go version it prints?

openoms commented 4 years ago

You are probably right:

"commit": "v0.10.2-beta.rc2",
"commit_hash": "de53605277a658fcde9a0bc690876000d390fca6",
"go_version": "go1.14.4"

still up BTW

openoms commented 4 years ago

I presume the problem is similar to this: https://github.com/lightningnetwork/lnd/issues/4052 which has been solved here: https://github.com/lightningnetwork/lnd/pull/4061 ?

guggero commented 4 years ago

Yes, I assume that's the problem. We need to build the v0.10.2 release with go1.13, only v0.11.0 will be go1.14 compatible.

Roasbeef commented 4 years ago

Can you try out this update branch @openoms: https://github.com/lightningnetwork/lnd/tree/v0.10.2-beta-rc2-branch? Updated one of the deps to match that commit linked (which I think is the issue?).

Roasbeef commented 4 years ago

Also, if you compile manually with Go 1.13 is it able to remain up?

openoms commented 4 years ago

got this with with strace while it failed overnight:

futex(0x178bccc, FUTEX_WAIT_PRIVATE, 0, NULL) = 0                                                                                                                       
clock_gettime(CLOCK_MONOTONIC, {tv_sec=63281, tv_nsec=515385968}) = 0                                                                                              
nanosleep({tv_sec=0, tv_nsec=3000}, NULL) = 0                                                                                                    
futex(0x178bccc, FUTEX_WAIT_PRIVATE, 0, NULL) = 0                                                                                                                        
clock_gettime(CLOCK_MONOTONIC, {tv_sec=63739, tv_nsec=670928200}) = 0                                                                                 
futex(0x178bccc, FUTEX_WAIT_PRIVATE, 0, NULL) = ?                                                                                                    
+++ killed by SIGSEGV +++ 

will complile https://github.com/lightningnetwork/lnd/tree/v0.10.2-beta-rc2-branch with go go1.13.3.

openoms commented 4 years ago

Looking good with the commit https://github.com/lightningnetwork/lnd/commit/73dcdf9e58a743b0f82b2fafd7f2bda90fc91665

$ lncli version
{
    "lncli": {
        "commit": "v0.10.2-beta.rc2-1-g73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "commit_hash": "73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "version": "0.10.2-beta.rc2",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 2,
        "app_pre_release": "beta.rc2",
        "build_tags": [
        ],
        "go_version": "go1.13.3"
    },
    "lnd": {
        "commit": "v0.10.2-beta.rc2-1-g73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "commit_hash": "73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "version": "0.10.2-beta.rc2",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 2,
        "app_pre_release": "beta.rc2",
        "build_tags": [
        ],
        "go_version": "go1.13.3"
    }
}

Will keep an eye on it.

openoms commented 4 years ago

No issues after 36 h, seems to be fixed. Should I try building with Go 1.14?

guggero commented 4 years ago

Yes, if you don't mind testing that. Would be great to know if it actually is the go version or something else.

openoms commented 4 years ago

ok testing with go 1.14.4 now

$ lncli version
{
    "lncli": {
        "commit": "v0.10.2-beta.rc2-1-g73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "commit_hash": "73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "version": "0.10.2-beta.rc2",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 2,
        "app_pre_release": "beta.rc2",
        "build_tags": [
        ],
        "go_version": "go1.14.4"
    },
    "lnd": {
        "commit": "v0.10.2-beta.rc2-1-g73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "commit_hash": "73dcdf9e58a743b0f82b2fafd7f2bda90fc91665",
        "version": "0.10.2-beta.rc2",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 2,
        "app_pre_release": "beta.rc2",
        "build_tags": [
        ],
        "go_version": "go1.14.4"
    }
}
Roasbeef commented 4 years ago

We've switched over to building the upcoming minor releases using just 1.13.3, but would be curious to see if 1.14.4 works with that existing branch still.

openoms commented 4 years ago

Running the branch: https://github.com/lightningnetwork/lnd/commits/v0.10.3-beta-rc1-branch with go1.14.4 All good so far.

 $ lncli version
{
    "lncli": {
        "commit": "v0.10.2-beta.rc4-26-gcda3088a0159516a403062db480425b6cbbae6c9",
        "commit_hash": "cda3088a0159516a403062db480425b6cbbae6c9",
        "version": "0.10.2-beta.rc4",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 2,
        "app_pre_release": "beta.rc4",
        "build_tags": [
        ],
        "go_version": "go1.14.4"
    },
    "lnd": {
        "commit": "v0.10.2-beta.rc4-26-gcda3088a0159516a403062db480425b6cbbae6c9",
        "commit_hash": "cda3088a0159516a403062db480425b6cbbae6c9",
        "version": "0.10.2-beta.rc4",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 2,
        "app_pre_release": "beta.rc4",
        "build_tags": [
        ],
        "go_version": "go1.14.4"
    }
}
openoms commented 4 years ago

Sorry closed prematurely. LND 0.10.3 started to have restarts again on a more active node.

$ lncli version
{
    "lncli": {
        "commit": "v0.10.3-beta",
        "commit_hash": "d62c575f8499a314eb27f12462d20500b6bda2c7",
        "version": "0.10.3-beta",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 3,
        "app_pre_release": "beta",
        "build_tags": [
            "autopilotrpc",
            "signrpc",
            "walletrpc",
            "chainrpc",
            "invoicesrpc",
            "watchtowerrpc"
        ],
        "go_version": "go1.14.4"
    },
    "lnd": {
        "commit": "v0.10.3-beta",
        "commit_hash": "d62c575f8499a314eb27f12462d20500b6bda2c7",
        "version": "0.10.3-beta",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 3,
        "app_pre_release": "beta",
        "build_tags": [
            "autopilotrpc",
            "signrpc",
            "walletrpc",
            "chainrpc",
            "invoicesrpc",
            "watchtowerrpc"
        ],
        "go_version": "go1.14.4"
    }
}

nothing in the lnd.log, only this again:

$ sudo strace -p 31706 -v
strace: Process 31706 attached
futex(0x178bcd4, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 (errno 4294967056)                                   
+++ killed by SIGSEGV +++

This node is:

Odroid HC1
32 bit Armbian
Linux 5.4.28-odroidxu4 armv7l GNU/Linux
Bitcoin Core version v0.20.0

The strange thing is that the same lnd version was stable since 48h+ on two RPi4-s I have updated first. Now it ahs restarted there also.

Switched to lnd v0.10.2-beta now (go_version": "go1.14.4). Will continue to report.

Same with v0.10.2-beta

$ sudo strace -p 6311 -v
strace: Process 6311 attached
futex(0x178bcec, FUTEX_WAIT_PRIVATE, 0, NULL                                     ) = ?
+++ killed by SIGSEGV +++

And now the node is back to lnd v0.10.1-beta with "go_version": "go1.13.10".

This still looks to be an issue related to the Go version and only realised on a busy node.

Will build from source with Go 1.13.30 again.

Roasbeef commented 4 years ago

Hmm, ok I think we may re-upload the binaries, but a version compiled using Go 1.13. This'll give us time to properly look into this so we can have things working properly for the major 0.11 release.

PatrickZGW commented 4 years ago

I see the same issue when updating to 0.10.3 on my Raspberrypi 4. I am using the release binary. Frequent crashes without an error log.

openoms commented 4 years ago

I see the same issue when updating to 0.10.3 on my Raspberrypi 4. I am using the release binary. Frequent crashes without an error log.

v0.10.3-beta is stable when built from source with Go 1.13.3. To use a binary need to downgrade to v0.10.1-beta.

PatrickZGW commented 4 years ago

Yes, just wanted to leave the comment here so that others can find this issue when they try to upgrade using the release binary.

openoms commented 4 years ago

@Roasbeef thanks for building https://github.com/lightningnetwork/lnd/releases/tag/v0.10.4-beta with the stable Go version. Updating now.

admin@raspberrypi:~ $ lncli -n testnet version
{
    "lncli": {
        "commit": "v0.10.4-beta",
        "commit_hash": "86114c575c2dff9dff1e1bb4df961c64aea9fc1c",
        "version": "0.10.4-beta",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 4,
        "app_pre_release": "beta",
        "build_tags": [
            "autopilotrpc",
            "signrpc",
            "walletrpc",
            "chainrpc",
            "invoicesrpc",
            "watchtowerrpc"
        ],
        "go_version": "go1.13.13"
    },
    "lnd": {
        "commit": "v0.10.4-beta",
        "commit_hash": "86114c575c2dff9dff1e1bb4df961c64aea9fc1c",
        "version": "0.10.4-beta",
        "app_major": 0,
        "app_minor": 10,
        "app_patch": 4,
        "app_pre_release": "beta",
        "build_tags": [
            "autopilotrpc",
            "signrpc",
            "walletrpc",
            "chainrpc",
            "invoicesrpc",
            "watchtowerrpc"
        ],
        "go_version": "go1.13.13"
    }
}
openoms commented 4 years ago

LND v0.11.0-beta.rc1 with go1.14.6 is stable since 24h+ with numerous payments and routing events so closing this for good. Thank you for the support!

$ lncli version
{
    "lncli": {
        "commit": "v0.11.0-beta.rc1",
        "commit_hash": "247b7530caf08a555ffd56f81019031bc1af6565",
        "version": "0.11.0-beta.rc1",
        "app_major": 0,
        "app_minor": 11,
        "app_patch": 0,
        "app_pre_release": "beta.rc1",
        "build_tags": [
            "autopilotrpc",
            "signrpc",
            "walletrpc",
            "chainrpc",
            "invoicesrpc",
            "watchtowerrpc"
        ],
        "go_version": "go1.14.6"
    },
    "lnd": {
        "commit": "v0.11.0-beta.rc1",
        "commit_hash": "247b7530caf08a555ffd56f81019031bc1af6565",
        "version": "0.11.0-beta.rc1",
        "app_major": 0,
        "app_minor": 11,
        "app_patch": 0,
        "app_pre_release": "beta.rc1",
        "build_tags": [
            "autopilotrpc",
            "signrpc",
            "walletrpc",
            "chainrpc",
            "invoicesrpc",
            "watchtowerrpc"
        ],
        "go_version": "go1.14.6"
    }
}