lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.67k stars 2.07k forks source link

Invalid commitment number sent on channel_reestablish #4085

Closed t-bast closed 4 years ago

t-bast commented 4 years ago

I've been investigating a force-close we experienced with an lnd node after a re-start. During the HTLC fulfill, we rebooted after sending our revoke_and_ack but before sending our commit_sig. Here are the logs from my side:

// We add an HTLC: everything goes well.
2020-03-10 10:03:37,350 INFO  f.a.eclair.Diagnostics PAY n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - OUT msg=UpdateAddHtlc(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,5604,118000 msat,071226afd0d0867e4b943c649af05fac463427e7918fa707c337b75801a1454e,CltvExpiry(621210),OnionRoutingPacket(0,ByteVector(33 bytes, 0x0323ff89b556028f205ef7abf6b6eca835070403dd1a9bae9a737690aa578659ed),ByteVector(1300 bytes, #-139322676),8a403c556c228d60a7729b70095c6145baa1dba8ce5a5ae874ef07e852ee1612))
2020-03-10 10:03:37,351 INFO  f.a.e.channel.Channel n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - built remote commit number=24004 htlc_in= htlc_out=5604 feeratePerKw=3531 txid=2fd6584ae3939f251b4bf9d7cb8afba7819bd4f6aa7b3884009962a1fbc0d2b6 tx=0200000001310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f00000000002c434480029d780a0000000000220020c855dddd2ddead46862b4de1dc441c9c35c4cc565531230188e5e48bdc175767f0a0e90000000000160014d25e0f057243dca55be4be982ac3794e7e7343c3d94eb120
2020-03-10 10:03:37,620 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - OUT msg=CommitSig(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,d4ced2d34888c542aa679455f021aff404d1eefc8876d4ef182ced4134a6aa876f04e13599548e12a9039eefbbd5dee56f5fe3043ab0103be5ff9d8ef650e14c,List(),None)
2020-03-10 10:03:37,648 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - IN msg=RevokeAndAck(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,PrivateKey(3a9f04fcde67ffa90bdcdaeeefddf2eef26af75a811a1bcd4cb0818ad3ca8a93),03ed628b2faf64f0e17eb6bb0dd6ed097f78b5076801822801363cbc572139c0dd,None)
2020-03-10 10:03:37,686 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - IN msg=CommitSig(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,4a4027eb22ecf2de73f3ab02e9df03f6c429318fafd3cdd9987a2daf5cfd81c50b216bf479415ff0186bce42dd8abe7b498fc1bc750c245d9f061f7c0c3eab39,List(),None)
2020-03-10 10:03:37,973 INFO  f.a.e.channel.Channel n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - built local commit number=23987 htlc_in= htlc_out=5604 feeratePerKw=3531 txid=c43f6030ff99aa274f851f0a595bc13f15f2927d12e64ed3b5781a354e1d5bf8 tx=0200000001310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f00000000002c434480029d780a00000000001600140f94b828d9666be3d659ccabe3058aa5a07ec2eaf0a0e900000000002200209bcc2390af770373f3e7e71fc726cfbe8e5f315884d52c316ae1289aa19758d8ae4eb120
2020-03-10 10:03:38,333 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - OUT msg=RevokeAndAck(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,PrivateKey(9ba5ab07d1f80bfd3876b3578dcc58027eaa321fe17ac896959d23e4a1a867e5),0382c8e4229385227f51772070cf67ca68a5bfeb0e16ba542e1577294f8e835e5f,None)

// The remote fulfills, tough luck we reboot after sending our revoke_and_ack but before sending our commit_sig.
2020-03-10 10:03:42,033 INFO  f.a.eclair.Diagnostics PAY n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - IN msg=UpdateFulfillHtlc(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,5604,af3426674c3c8aca59f89a97a5ccb47f6d6766fe5ff3dbaa6e0be029bfbd3a36)
2020-03-10 10:03:42,201 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - IN msg=CommitSig(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,a447dc187126a86244e19674fead85faad9c94c946bf1c678eb89436fff413cb1099060f942d50d3108579c99d946fc0e4fa891bdb892e7d8e4a64dbf3c02eae,List(),None)
2020-03-10 10:03:42,202 INFO  f.a.e.channel.Channel n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - built local commit number=23988 htlc_in= htlc_out= feeratePerKw=3531 txid=df367c4ae144d61ebd210d526bc8c6503c13ef137ea408d08fe92c2d9cfd5cd7 tx=0200000001310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f00000000002c4344800213790a0000000000160014c9f3c0fdaebb8bd67ab0164146b2b5d30c6106a5f0a0e90000000000220020cbe5e5a7a69786092531d5c1bbc43ae06cc0d53be5f29388f5ab414f8de87bd8a94eb120
2020-03-10 10:03:42,416 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - OUT msg=RevokeAndAck(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,PrivateKey(ef5e1fa28cfe8aa3481c49f9a338ef9a419bce36859a43822af78ac2ded4a2cd),03537b33861e5154096573193610eba95f1ffde973b25ebd89e89228f9a089ac00,None)
2020-03-10 10:03:42,417 INFO  f.a.e.channel.Channel n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - built remote commit number=24005 htlc_in= htlc_out= feeratePerKw=3531 txid=bbdcbf2bd7d2e5fd789f597a5388cd186642659d334fe8005357fddd8932e552 tx=0200000001310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f00000000002c4344800213790a0000000000220020657dbdd7f48b2f353e5e06944e898f0011bc4ab3e63e181088aa33192984b341f0a0e900000000001600147b2c6c1883390300a8428dd685582a23877d6621d84eb120

// After reboot and channel_reestablish, we re-send our commit_sig but disagree on commitment_number so the remote closes.
2020-03-10 10:11:02,829 INFO  f.a.eclair.Diagnostics CON n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - OUT msg=ChannelReestablish(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,23989,24004,PrivateKey(3a9f04fcde67ffa90bdcdaeeefddf2eef26af75a811a1bcd4cb0818ad3ca8a93),0382c8e4229385227f51772070cf67ca68a5bfeb0e16ba542e1577294f8e835e5f,None)
2020-03-10 10:13:21,647 INFO  f.a.eclair.Diagnostics CON n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - IN msg=ChannelReestablish(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,24005,23988,PrivateKey(ef5e1fa28cfe8aa3481c49f9a338ef9a419bce36859a43822af78ac2ded4a2cd),03fc99615a343c6c646c205f3d1a35059f7ed268c49ad48e1dd09cc55fdf4560c0,None)
2020-03-10 10:13:21,651 INFO  f.a.e.channel.Channel n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - built remote commit number=24005 htlc_in= htlc_out= feeratePerKw=3531 txid=bbdcbf2bd7d2e5fd789f597a5388cd186642659d334fe8005357fddd8932e552 tx=0200000001310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f00000000002c4344800213790a0000000000220020657dbdd7f48b2f353e5e06944e898f0011bc4ab3e63e181088aa33192984b341f0a0e900000000001600147b2c6c1883390300a8428dd685582a23877d6621d84eb120
2020-03-10 10:13:21,663 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - OUT msg=CommitSig(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,0f016bf0278a05cbf2fd221e756d5d0fd6f0a040cdccdc759243444226d7019c25aded7949daea6ad0393eb62ecd7e4283997434507d6067d3a9651d61c6a182,List(),None)
2020-03-10 10:13:33,388 INFO  f.a.eclair.Diagnostics n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - IN msg=Error(310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f,ByteVector(538 bytes, #-1865211027))
2020-03-10 10:13:33,388 ERROR f.a.e.channel.Channel n:02cda8c01b2303e91bec74c43093d5f1c4fd42a95671ae27bf853d7dfea9b78c06 c:310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f - peer sent error: ascii='rejected commitment: commit_height=24005, invalid_commit_sig=304402200f016bf0278a05cbf2fd221e756d5d0fd6f0a040cdccdc759243444226d7019c022025aded7949daea6ad0393eb62ecd7e4283997434507d6067d3a9651d61c6a182, commit_tx=0200000001310d5c10de91aa8f5238c775da868891885baef86d591adc830a1bcd257af94f00000000002c434480029d780a0000000000220020657dbdd7f48b2f353e5e06944e898f0011bc4ab3e63e181088aa33192984b341f0a0e900000000001600147b2c6c1883390300a8428dd685582a23877d6621d84eb120, sig_hash=bd186acfc1a430e4c490f3054cf995dbeb95a4ca3ee4947670d2a42ddb6ec047' bin=72656a656374656420636f6d6d69746d656e743a20636f6d6d69745f6865696768743d32343030352c20696e76616c69645f636f6d6d69745f7369673d33303434303232303066303136626630323738613035636266326664323231653735366435643066643666306130343063646363646337353932343334343432323664373031396330323230323561646564373934396461656136616430333933656236326563643765343238333939373433343530376436303637643361393635316436316336613138322c20636f6d6d69745f74783d303230303030303030313331306435633130646539316161386635323338633737356461383638383931383835626165663836643539316164633833306131626364323537616639346630303030303030303030326334333434383030323964373830613030303030303030303032323030323036353764626464376634386232663335336535653036393434653839386630303131626334616233653633653138313038386161333331393239383462333431663061306539303030303030303030303136303031343762326336633138383333393033303061383432386464363835353832613233383737643636323164383465623132302c207369675f686173683d62643138366163666331613433306534633439306633303534636639393564626562393561346361336565343934373637306432613432646462366563303437

When we reconnect, lnd expects the commit tx that still has the HTLC, whereas they did send their commit_sig and received our revoke_and_ack. Notice how the commitment numbers don't match up the way they should.

I have reached out to the operator of the lnd node (zapread.com) to find out what version they're running and ideally get their logs. I'll update this issue once I have them. If you know how to reach the owner, please ping them on my behalf.

However it feels to me that this is somewhat easy to reproduce with the exact steps the logs show (unless there's a tricky race condition happening).

Let me know how I can help, it would be great to reduce those channel close on reconnect.

Roasbeef commented 4 years ago

Thanks for the report! Not much we can do on our end without knowing the version of lnd the user was running. We believe that we've fixed the prior instances of state de-sync upon reconnection with v0.9.1, which was released a few weeks ago.

t-bast commented 4 years ago

I just got word from the operator, he was running 0.8.0 at the time and upgraded two days later (too bad). So you believe this is fixed in 0.9.0?

Roasbeef commented 4 years ago

So you believe this is fixed in 0.9.0?

v0.9.1 (our latest release)

Roasbeef commented 4 years ago

Closing for now as based on the current information the user was running a version before our fix landed.