We are seeing some odd behavior when upgrading from 0.12.0 to 0.12.2. We've never seen this before so it appears this got introduced in 0.12.1 or 0.12.2, but the commits don't jump out at all.
We are running LND in integrated mode with Litd. We are setting the flag --lnd.tlsencryptkey (which we always have). When setting this flag, LND will create an ephemeral TLS cert and key that are used just for unlocking the node, deletes them, and then uses the persistent TLS cert and key that's on disk like normal. From all testing, I can confirm that this is working in LND and it correctly rotates the certificates after it's unlocked. I can even locally call the endpoint curl -vk https://localhost:10009 and it is serving the correct certificate.
The problem is, the ephemeral TLS cert and key are only valid for 24 hours because they are meant to be short lived. After 24 hours, the REST proxy for LND stops working and returns an error like:
2023-12-14 21:10:46.680 [WRN] GRPC: [core] [Channel #60 SubChannel #880] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:10009", ServerName: "127.0.0.1:10009", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-14T21:10:46Z is after 2023-12-14T18:51:40Z"
I confirm that the dates match what is in the ephemeral TLS certificate, so it appears that something is either caching the certificate or it's still being served from somewhere. The other strange part is that everything else works fine gRPC port works and the litd proxy works. I have tried to debug this without success. I can't see anywhere that the ephemeral cert is actually being served so I have no idea this 1 interface is trying to use it and complaining about it.
I can see the errors are coming from the GRPC subserver which is only present in litd, so that is why I'm thinking it's something in litd and not LND.
Expected behavior
I expect that the REST proxy to work consistently after unlocking like the other interfaces.
Actual behavior
After 24 hours, the REST proxy of LND no longer works. You can certificate errors. The gRPC and Litd interfaces work just fine, it's only the REST proxy. Example:
2023-12-18 05:02:55.473 [DBG] GRPC: [core] [pick-first-lb 0x401fcfa960] Received SubConn state update: 0x401fcfa9f0, {ConnectivityState:TRANSIENT_FAILURE ConnectionError:connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"}
2023-12-18 05:02:55.473 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel Connectivity change to TRANSIENT_FAILURE, last error: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"
2023-12-18 05:02:55.473 [WRN] GRPC: [core] [Channel #58 SubChannel #1441] grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:10009", ServerName: "127.0.0.1:10009", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"
2023-12-18 05:02:55.473 [DBG] GRPC: [core] Creating new client transport to "{Addr: \"127.0.0.1:10009\", ServerName: \"127.0.0.1:10009\", }": connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:02:55Z is after 2023-12-15T21:57:41Z"
2023-12-18 05:02:55.434 [DBG] GRPC: [core] [pick-first-lb 0x401fcfa960] Received SubConn state update: 0x401fcfa9f0, {ConnectivityState:CONNECTING ConnectionError:<nil>}
2023-12-18 05:02:55.433 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel picks a new address "127.0.0.1:10009" to connect
2023-12-18 05:02:55.433 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel Connectivity change to CONNECTING
2023-12-18 05:02:55.432 [DBG] GRPC: [core] [pick-first-lb 0x401fcfa960] Received SubConn state update: 0x401fcfa9f0, {ConnectivityState:IDLE ConnectionError:connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:00:33Z is after 2023-12-15T21:57:41Z"}
2023-12-18 05:02:55.432 [DBG] GRPC: [core] [Channel #58 SubChannel #1441] Subchannel Connectivity change to IDLE, last error: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-18T05:00:33Z is after 2023-12-15T21:57:41Z"
To reproduce
Run litd with LND integrated and lnd.tlsencryptkey set to true. Wait for the ephemeral TLS certificate to expire and then try to make calls to the LND REST Proxy.
We are seeing some odd behavior when upgrading from 0.12.0 to 0.12.2. We've never seen this before so it appears this got introduced in 0.12.1 or 0.12.2, but the commits don't jump out at all.
We are running LND in
integrated
mode with Litd. We are setting the flag--lnd.tlsencryptkey
(which we always have). When setting this flag, LND will create an ephemeral TLS cert and key that are used just for unlocking the node, deletes them, and then uses the persistent TLS cert and key that's on disk like normal. From all testing, I can confirm that this is working in LND and it correctly rotates the certificates after it's unlocked. I can even locally call the endpointcurl -vk https://localhost:10009
and it is serving the correct certificate.The problem is, the ephemeral TLS cert and key are only valid for 24 hours because they are meant to be short lived. After 24 hours, the REST proxy for LND stops working and returns an error like:
I confirm that the dates match what is in the ephemeral TLS certificate, so it appears that something is either caching the certificate or it's still being served from somewhere. The other strange part is that everything else works fine gRPC port works and the litd proxy works. I have tried to debug this without success. I can't see anywhere that the ephemeral cert is actually being served so I have no idea this 1 interface is trying to use it and complaining about it.
I can see the errors are coming from the
GRPC
subserver which is only present in litd, so that is why I'm thinking it's something in litd and not LND.Expected behavior
I expect that the REST proxy to work consistently after unlocking like the other interfaces.
Actual behavior
After 24 hours, the REST proxy of LND no longer works. You can certificate errors. The gRPC and Litd interfaces work just fine, it's only the REST proxy. Example:
Works:
Does not work:
Here's debug logging:
To reproduce
Run litd with LND integrated and lnd.tlsencryptkey set to true. Wait for the ephemeral TLS certificate to expire and then try to make calls to the LND REST Proxy.
System information
Example litd conf: