Closed matheusd closed 1 month ago
The introduction of exclusive groups in the sweeper means multiple sweeping transactions might be published in the same sweep round, but currently only the last one is stored and published back on restart.
The rebroadcast of the last transaction is in order to handle an edge case (is this still an issue @joostjager?). The sweeper does not offer any guarantees that inputs offered will be swept if a shutdown occurs, they need be to re-offered on startup to resume.
Ideally, the sweeper should actually track all published transactions and act on them according to what happens afterwards (whether they are successfully mined, RBF'd, whether they are now invalid due to one of its inputs getting spent on a different mined transaction, etc).
Could you describe what problems this would solve for you? The current design acts on a best-effort basis and does not rely on broadcast errors (since it can't for light clients). Inputs are rescheduled randomly regardless of the broadcast result to eventually converge on a valid set of inputs that will result in a valid transaction. Things get tricky here of course when taking into account all of the RBF rules, but we've decided not to implement those for now.
Ideally, the sweeper should actually track all published transactions and
Could you describe what problems this would solve for you?
Allow external clients to track what's happening on the sweeper (which transactions were attempted, fees, what happened to them, etc) without having to go through logs.
See next answer for what prompted opening this.
The current design acts on a best-effort basis and does not rely on broadcast errors (since it can't for light clients).
Indeed. I must've made a mistake running the test, because I was sure it was actually relying on the broadcast errors. But anyway, the issue I'm facing can be surfaced by running this diff with the following icase:
make build-itest itest-only backend="neutrino" icase="test_multi-hop_htlc/committype=anchors/remote_chain_claim"
The test failure itself is unrelated to this particular issue (I already have a fix for that). A sample of the debug data introduced in the diff:
lnd_test.go:139: vvvvvvvvvvvvv before close vvvvvvvvvvv
lnd_test.go:139: vvvvvvvvvvvvv before mining close vvvvvvvvvvv
lnd_test.go:135: 973f7ffd1639a9624aeb6b93e29aef749a2db97997fdae851dc1f4e69cf032a4:0
lnd_test.go:135: 1d771c9dae17aa7a52387b710877283f1903fca675245e3d069ad318b0982ca8:0
lnd_test.go:139: vvvvvvvvvvvvv after mining close vvvvvvvvvvv
lnd_test.go:135: 973f7ffd1639a9624aeb6b93e29aef749a2db97997fdae851dc1f4e69cf032a4:0
And the relevant lines from output-2-Alice
log:
2020-05-08 08:19:02.802 [INF] SWPR: Creating sweep transaction 1d771c9dae17aa7a52387b710877283f1903fca675245e3d069ad318b0982ca8 for 2 inputs (440ef8164df227e82543be7c504b10337a1f664809a6ca65337cd83f4c140aeb:0 (CommitmentAnchor), dcdedb8ccd6ae221c09736e3427294e0792b2fb620209516f772b9a771df194c:1 (WitnessKeyHash)) using 253 sat/kw, tx_fee=0.00000181 BTC
2020-05-08 08:19:03.304 [INF] SWPR: Creating sweep transaction 973f7ffd1639a9624aeb6b93e29aef749a2db97997fdae851dc1f4e69cf032a4 for 2 inputs (b567d0e7818a07719ee628a0ee0db3d2f4da1286d8a1615c08b5281a52e6983d:0 (CommitmentAnchor), 79342a7482441971d7a6b18a76adf473eb01c5cadf3f3b987fcf5ca36405f907:0 (WitnessKeyHash)) using 253 sat/kw, tx_fee=0.00000181 BTC
2020-05-08 08:19:07.885 [DBG] SWPR: Dispatching sweep success for 440ef8164df227e82543be7c504b10337a1f664809a6ca65337cd83f4c140aeb:0 to 1 listeners
2020-05-08 08:19:07.885 [DBG] SWPR: Dispatching sweep error for b567d0e7818a07719ee628a0ee0db3d2f4da1286d8a1615c08b5281a52e6983d:0 to 1 listeners: other member of exclusive group was spent
ChanArb creates two transactions to redeem the anchor outputs (one for the local commitment tx, one for the remote). One of them gets confirmed, the other one just became invalid. However, the neutrino wallet will now have an unconfirmed utxo "forever". Note that the other commitment is technically an orphan (not a double spend) since the other commitment tx was never even broadcast.
This is how I'm solving this for the moment: https://github.com/matheusd/dcrlnd/compare/master...matheusd:upstream-v0.10-priv
This is kinda crappy though because it only works "incidentally" for the test. Whence the purpose of this issue: to actually take correct actions in regards to the wallet, the sweeper should be keeping track of every published sweep attempt so that it can detect when something got invalidated (so it can abandon the specific tx instead of blankly attempting to abandon transactions-spending-an-input), how many different attempts it made via RBF/CPFP to sweep something, stats for what actually got swept and when, etc.
I don't expect fixing this to be simple or immediate though: I opened this more to get an opinion on whether this is a reasonable thing to do and to be able to track progress (in case that it is).
I haven't looked through your changes thoroughly yet, but perhaps it is better to introduce this logic at the btcwallet level (the repo itself, not our wrapper over it)? It already has some logic to remove double spend conflicts, but I'm not sure if it can handle the case you've outline above.
We briefly discussed this alternative around here (solving in {btc,dcr}wallet).
AFAICT, to have this solved in the wallet would require the wallet tracking the entire chain of unconfirmed of transactions, so it can act when any one of the transactions in the chain becomes invalid. It would have to be presented the other commitment tx, so it can know which inputs to track or alternatively, it would have to track orphan credits (that is, credits originating in transactions which inputs can't be found in the utxo set) separately as well.
The rebroadcast of the last transaction is in order to handle an edge case (is this still an issue @joostjager?). The sweeper does not offer any guarantees that inputs offered will be swept if a shutdown occurs, they need be to re-offered on startup to resume.
Yes, this is still code that is needed.
// Republish in case the previous call crashed lnd. We don't care about
// the return value, because inputs will be re-offered and retried
// anyway. The only reason we republish here is to prevent the corner
// case where lnd goes into a restart loop because of a crashing publish
// tx where we keep deriving new output script. By publishing and
// possibly crashing already now, we haven't derived a new output script
// yet.
Sweeper has been reengineered significantly. Don't think this issue applies anymore. Should be reassessed and reopened if still an issue.
The introduction of exclusive groups in the sweeper means multiple sweeping transactions might be published in the same sweep round, but currently only the last one is stored and published back on restart.
Ideally, the sweeper should actually track all published transactions and act on them according to what happens afterwards (whether they are successfully mined, RBF'd, whether they are now invalid due to one of its inputs getting spent on a different mined transaction, etc).
Opening this issue in lnd to see if it's reasonable here since I need to solve (some aspects of) this in dcrlnd due to some differences in SPV vs Neutrino mode.