talaia-labs / rust-teos

The Eye of Satoshi - Lightning Watchtower
https://talaia.watch
MIT License
128 stars 62 forks source link

[bug]: watchtower abandoned by client not being deleted on server side #198

Open chrisguida opened 1 year ago

chrisguida commented 1 year ago

On a test version of CLN for embassyOS (which includes both rust-teos and watchtower-client) I had my node (Tor) register with @BitcoinMechanic's tower (also Tor). My node has a lot of channels and was creating tons of appointments. Many of these became "pending", which I guess means they are not actually being watched by the tower. I abandoned the watchtower about a day later, and the client responded with this message:

2023-03-03T10:38:15-06:00 "<tower id> successfully abandoned"

However, on the server side, teos-cli --datadir=.lightning/.teos getuser <my user id> still shows my user and all its appointments. If the client reports successfully abandoning the watchtower (and listtowers no longer includes the tower), shouldn't this delete all the appointments on the server side?

Additionally, my client started spitting out messages like this:

2023-03-03T11:06:11-06:00  2023-03-03T17:06:11.938Z **BROKEN** plugin-watchtower-client: Cannot add appointment receipt to tower. Unknown tower_id: <tower id>
2023-03-03T11:06:11-06:00  2023-03-03T17:06:11.940Z **BROKEN** plugin-watchtower-client: Cannot remove pending appointment to tower. Unknown tower_id: <tower id>

Is this perhaps another bug in the client? The client should simply forget all the appointments for abandoned towers, correct?

By the way both client and server are running rust-teos commit 43f99713159a63884e9c851134d126ca1ec48f7e, which is the official v0.2.0 release tag.

sr-gi commented 1 year ago

However, on the server side, teos-cli --datadir=.lightning/.teos getuser still shows my user and all its appointments. If the client reports successfully abandoning the watchtower (and listtowers no longer includes the tower), shouldn't this delete all the appointments on the server side?

Abandonments are unilateral for now. A general request to delete data from the tower by the user is due, but it has to be taken care of carefully to not leak too much data about the user. A request to completely delete all data from the tower by the user may be easier and could pair with the abandontower request.

2023-03-03T11:06:11-06:00  2023-03-03T17:06:11.938Z **BROKEN** plugin-watchtower-client: Cannot add appointment receipt to tower. Unknown tower_id: <tower id>
2023-03-03T11:06:11-06:00  2023-03-03T17:06:11.940Z **BROKEN** plugin-watchtower-client: Cannot remove pending appointment to tower. Unknown tower_id: <tower id>

This is indeed unexpected. It seems like the Retrier may not be dropping the data references when a tower is abandoned and keeps trying to work on them. Are those popping recursively, or do they end up disappearing? In any case, I'll take a look at this.

chrisguida commented 1 year ago

Are those popping recursively, or do they end up disappearing?

I saw the same two messages for about 28 minutes after the tower was abandoned, and then they stopped.

There were hundreds or thousands of appointments, not sure exactly how many, but it was a lot.

Another question for you: Do towers hold on to all channel state changes indefinitely? I guess they must, since a node can publish any arbitrary old state?

sr-gi commented 1 year ago

Are those popping recursively, or do they end up disappearing?

I saw the same two messages for about 28 minutes after the tower was abandoned, and then they stopped.

Ok, that's less concerning. It is possible that the item that was being worked on tried to read/write from the db, the plugin realized the tower was abandoned and then stopped that task. In any case, I'll check that.

There were hundreds or thousands of appointments, not sure exactly how many, but it was a lot.

Another question for you: Do towers hold on to all channel state changes indefinitely? I guess they must, since a node can publish any arbitrary old state?

It has to indeed. What I'm actually pretty curious about is the appointments were marked as pending in the first place. There are two alternatives, either the tower became unreachable, or the subscription expired/got out of slots. The former would make sense, and that would have been the expected behavior. However, the latter should have been managed by the plugin on its own by refreshing the registration with the tower.

From what you've mentioned so far I leaning towards the first, given there were hundreds of thousands of appointments, so I guess the client re-negotiated the subscription with the tower several times, but I'd like to shed some light on that in order to rule out potential bugs.

chrisguida commented 1 year ago

Ok, let me know what data I can provide to assist. The client and tower are connected over Tor, so it's very possible the connection is laggy or unreliable. Also the tower is being run on a 4GB raspberry pi :p

Another question for you: how can I manually drop a particular user on the tower side? Seems like the server operator should be able manually to correct the situation where the client abandons the tower but the tower is still responding to that user's appointments?

Thanks so much for your responses so far!

sr-gi commented 1 year ago

Ok, let me know what data I can provide to assist. The client and tower are connected over Tor, so it's very possible the connection is laggy or unreliable. Also the tower is being run on a 4GB raspberry pi :p

Another question for you: how can I manually drop a particular user on the tower side? Seems like the server operator should be able manually to correct the situation where the client abandons the tower but the tower is still responding to that user's appointments?

Thanks so much for your responses so far!

You cannot atm.

All functionality regarding deleting data from the tower has been purposely left out because it can actually be detrimental for the tower's reputation if not handled with care. It may also introduce some DoS attack vectors. Check: https://github.com/talaia-labs/python-teos/issues/158

I'm currently evaluating what's the best approach to add minimal functionality in this regard without it being exploitable. Suggestions for use cases are welcome. For now, out of your issue, I'm getting:

sr-gi commented 1 year ago

Just for reference: https://github.com/talaia-labs/rust-teos/discussions/203

Feel free to jump into the discussion if you're into accumulators @chrisguida