lightninglabs / lightning-terminal

Lightning Terminal: Your Home for Lightning Liquidity
MIT License
488 stars 82 forks source link

Issues connecting to LND's REST Proxy when encrypting LND's TLS Key #696

Closed gkrizek closed 5 months ago

gkrizek commented 6 months ago

We are seeing some odd behavior when upgrading from 0.12.0 to 0.12.2. We've never seen this before so it appears this got introduced in 0.12.1 or 0.12.2, but the commits don't jump out at all.

We are running LND in integrated mode with Litd. We are setting the flag --lnd.tlsencryptkey (which we always have). When setting this flag, LND will create an ephemeral TLS cert and key that are used just for unlocking the node, deletes them, and then uses the persistent TLS cert and key that's on disk like normal. From all testing, I can confirm that this is working in LND and it correctly rotates the certificates after it's unlocked. I can even locally call the endpoint curl -vk https://localhost:10009 and it is serving the correct certificate.

The problem is, the ephemeral TLS cert and key are only valid for 24 hours because they are meant to be short lived. After 24 hours, the REST proxy for LND stops working and returns an error like:

2023-12-14 21:10:46.680 [WRN] GRPC: [core] [Channel #60 SubChannel #880] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:10009", ServerName: "127.0.0.1:10009", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-14T21:10:46Z is after 2023-12-14T18:51:40Z"

I confirm that the dates match what is in the ephemeral TLS certificate, so it appears that something is either caching the certificate or it's still being served from somewhere. The other strange part is that everything else works fine gRPC port works and the litd proxy works. I have tried to debug this without success. I can't see anywhere that the ephemeral cert is actually being served so I have no idea this 1 interface is trying to use it and complaining about it.

I can see the errors are coming from the GRPC subserver which is only present in litd, so that is why I'm thinking it's something in litd and not LND.

Expected behavior

I expect that the REST proxy to work consistently after unlocking like the other interfaces.

Actual behavior

After 24 hours, the REST proxy of LND no longer works. You can certificate errors. The gRPC and Litd interfaces work just fine, it's only the REST proxy. Example:

Works:

curl ...macaroon_stuff... https://localhost:8443/v1/getinfo

Does not work:

curl ...macaroon_stuff... https://localhost:8080/v1/getinfo

Here's debug logging:

2023-12-18 05:02:55.473 [DBG] GRPC: [core] [pick-first-lb 0x401fcfa960] Received SubConn state update: 0x401fcfa9f0, {ConnectivityState:TRANSIENT_FAILURE ConnectionError:connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"}
2023-12-18 05:02:55.473 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"
2023-12-18 05:02:55.473 [WRN] GRPC: [core] [Channel #58 SubChannel #1441] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:10009", ServerName: "127.0.0.1:10009", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"
2023-12-18 05:02:55.473 [DBG] GRPC: [core] Creating new client transport to "{Addr: \"127.0.0.1:10009\", ServerName: \"127.0.0.1:10009\", }": connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"
2023-12-18 05:02:55.434 [DBG] GRPC: [core] [pick-first-lb 0x401fcfa960] Received SubConn state update: 0x401fcfa9f0, {ConnectivityState:CONNECTING ConnectionError:<nil>}
2023-12-18 05:02:55.433 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel picks a new address "127.0.0.1:10009" to connect
2023-12-18 05:02:55.433 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel Connectivity change to CONNECTING
2023-12-18 05:02:55.432 [DBG] GRPC: [core] [pick-first-lb 0x401fcfa960] Received SubConn state update: 0x401fcfa9f0, {ConnectivityState:IDLE ConnectionError:connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:00:33Z is after 2023-12-15T21:57:41Z"}
2023-12-18 05:02:55.432 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel Connectivity change to IDLE, last error: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:00:33Z is after 2023-12-15T21:57:41Z"

To reproduce

Run litd with LND integrated and lnd.tlsencryptkey set to true. Wait for the ephemeral TLS certificate to expire and then try to make calls to the LND REST Proxy.

System information

Example litd conf:

[Application Options]
# Lit
httpslisten=0.0.0.0:8443
disableui=1
enablerest=1
restcors="*"
lit-dir=/var/litd
tlscertpath=/var/litd/tls.cert
tlskeypath=/var/litd/tls.key
network=mainnet
lnd-mode=integrated
faraday-mode=integrated
loop-mode=integrated
pool-mode=integrated
taproot-assets-mode=integrated

# LND
lnd.lnddir=/var/lnd
lnd.restcors="*"
lnd.debuglevel=info
lnd.tlscertpath=/var/lnd/tls.cert
lnd.tlskeypath=/var/lnd/tls.key
lnd.tlsencryptkey=1
lnd.tlsdisableautofill=1
lnd.tlsextradomain=localhost
lnd.tlsautorefresh=1
lnd.tlscertduration=20000h0m0s
lnd.rpclisten=0.0.0.0:10009
lnd.restlisten=0.0.0.0:8080
lnd.listen=0.0.0.0:9735

lnd.bitcoin.active=1
lnd.bitcoin.feerate=100

lnd.bitcoin.mainnet=true

lnd.bitcoin.node=btcd
...