talaia-labs / rust-teos

The Eye of Satoshi - Lightning Watchtower
https://talaia-labs.github.io/talaia.watch/
MIT License
135 stars 63 forks source link

test_unreachable_watchtower randomly timeouts #84

Closed sr-gi closed 2 years ago

sr-gi commented 2 years ago

There's a test for the CLN E2E tests that randomly fails due to a timeout when checking if all pending appointments have been sent to a previously unreachable tower.

Check why this is happening, and most importantly if it is an issue with the test or with the code.

https://github.com/talaia-labs/rust-teos/runs/7557625852?check_suite_focus=true#step:9:6002

sr-gi commented 2 years ago

Ok, I think this may actually be a bug with the client.

If on_commitment_revocation is hit while the retrier is working on a tower, it could be the case that an additional appointment is added to pending but the retrier never tries to send that out. This is due to how the retrier fetches the appointment data when managing a retry: it does load all the pending appointment from the database up to that moment.

It feels like the proper approach should be iterating over the TowerSummary::pending_appointments for the given tower and fetch them one by one. The main issue with this approach is that we cannot hold a reference to pending_appointments and iterate over it since the state (WTClient) is behind a Mutex.

Currently thinking about how to approach this.