Closed JssDWt closed 3 months ago
Are you able to hit this outside of your testing infrastructure? Ran a quick test this morning with LND -- LND -- CLN -(private)- CLN
on the specified versions and I'm unable to reproduce this failure.
@carlaKC this is an issue we've seen in production inconsistently with a few customers and we were to reproduce consistently in a testing environment.
Hi all, we're the ones getting the bug in production in case I can be of any help. This issue is preventing affected nodes running on Greenlight infrastructure to receive payments from Alby or Blink (only ones tested so far).
@carlaKC this is an issue we've seen in production inconsistently with a few customers and we were to reproduce consistently in a testing environment.
Very helpful to be able to consistently reproduce it! I'm just wondering what the difference is between the test setup and the quick test I ran manually - that'll be very helpful to try pin-point where the issue is.
I'm just wondering what the difference is between the test setup and the quick test I ran manually
I agree! The logs from a test are supplied at the bottom of the issue description with trace level from lnd. Maybe differences can be seen in there. If there's any other information that can be helpful, or running tests with specific settings, let me know.
have reproduced 👍
@ellemouton Need a build with more data? I can have it spit out all the gory details to stderr....
We will give this error if they try to send a legacy onion, but nobody does that any more. Surely?
Indeed, LND log 2915:
Payload: ([]uint8) (len=33 cap=64) {
00000000 00 26 28 0f 74 eb c5 87 c8 00 00 00 00 00 00 27 |.&(.t..........'|
00000010 10 00 00 00 88 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 |.|
See that 00 at the front of the payload? That's legacy, which we removed support for in 23.05 :grimacing: (Wait, no, 22.11!)
Weird, we'll only try to send a legacy onion if either the hop hints sent over don't have the proper feature bits set, or we never got a node announcement from the destination node (so we don't know their feature bit set).
Ah, I think what's happening is that if CLN isn't setting the bit at all (we still set it, but have it marked as required), then we'd assume that the peer is super old and doesn't understand the new onion format.
@Roasbeef Can you elaborate which feature bits specifically?
Apparently (according to @ellemouton) the node hasn't seen the CLN node announcement, and is assuming the worst. FWIW we stopped generating them in our 2022-04 release, six months before we stopped accepting them.
You should probably just rip out the code which generates the old-style legacy onions altogether. For us it was a nice simplification...
@JssDWt the TLV onion feature bit.
What I think is happening here is CLN not setting the bit at all (or we don't have a node announcement, so we assume they don't support it). Newer lnd clients (0.18 and forward) will start to set the bit to required. Eventually we'll also just stop setting it all together, but we can't do that yet, as older clients (pre 0.18) will use the feature bit to decide if they can do things like MPP or not.
We can also just assume it's active (for 0.18), but as mentioned that won't help older clients (who still look at the bit).
CLN are setting the bit (it is set in the invoice) but the main issue is that the LND nodes in the setup are not seeing the CLN node announcement. They do get the channel announcement though & so know of the node but then assume a zero length feature bit vector. So that is the main issue.
The second issue (which I realised will only actually matter in the case where an invoice has chained hop hints) is that we (LND) assume that hop hints have zero length feature bit vectors (ie, we dont see them as having the TLV bit set and so we would use legacy encoding for these hops).
Background
There is an interop issue with Core lightning that causes INVALID_ONION_HMAC errors. It consistently reproduces with the following setup:
LND sender -> LND routing node -> CLN routing node -> CLN recipient
It does NOT reproduce with
Your environment
lnd
: 0.17.5Steps to reproduce
Create 4 lightning nodes with channels between them:
Send a payment from the sender to the recipient over the two routing nodes.
Expected behaviour
The payment should succeed
Actual behaviour
The payment fails with INVALID_ONION_HMAC from index 1.
testing setup
invalid_onion_hmac
branch of https://github.com/breez/lspdgo test -timeout 5m -v ./itest -test.run TestInvalidOnionHmac --bitcoindexec /path/to/bitcoind --lightningdexec /path/to/lightningd --lndexec /path/to/lnd --bitcoincliexec /path/to/bitcoin-cli --testdir /path/to/temporary/testdir --preservelogs
logs
lnd-lnd-sender.log lnd-lnd-router.log lightningd-cln-router.log lightningd-cln-recipient.log
CLN issue
https://github.com/ElementsProject/lightning/issues/7347