Closed jviki closed 3 days ago
Hey @jviki thanks for making this issue!
After failing to fetch block for height=817357
did you observe any attempts for next blocks i.e 817358
?
This is a tricky question. I have been playing with the node a lot and did lots of restarts. Thus, yes, it fetched the block, of course. But because of restarts.
I could see this situation multiple times. I don't say, it is broken. What I am trying to say is that when watching the logs, I have no clue if the watchtower is working or if it somehow gave up on its task. How long should I wait to confirm this when watching the logs? What log message am I looking for?
In the end, this is probably quite similar issue to PR https://github.com/lightningnetwork/lnd/issues/4447, i.e. lack of information of the runtime state.
In the end, this is probably quite similar issue to PR https://github.com/lightningnetwork/lnd/issues/4447, i.e. lack of information of the runtime state.
yeah, could def use more info on WC state
But because of restarts.
Yeap, by following your link it seems that we lack any retry logic on the WC, I believe this could be quite easily added
@saubyk how do we want to support this, does it make sense to make this configurable from the config file? We can have a default > 1 , let’s say 3. If user thinks this is too much or too less they can set it according to their needs
@anibilthare - commented on your PR too. I think we can just make it retry indefinitely with an exponential backoff
Background
Describe your issue here.
If lnd watchtower fails to fetch a block, it looks like it never tries it again and thus it might not do its job. This would be especially an issue when using neutrino backend. It seems to be related to this comment in the code: https://github.com/lightningnetwork/lnd/blob/d9b88fba67a91583227d1d986deaa05177585bb1/watchtower/lookout/lookout.go#L125
Failing because of some timeout (no clue yet what is the exact reason behind that):
2023-11-18 18:37:29.084 [DBG] WTWR: Fetching block for (height=817357, hash=000000000000000000017421e65bd66c9432edd46c5e8b70cd4c12b542610e93) 2023-11-18 18:37:35.087 [ERR] WTWR: Unable to fetch block for (height=c78cd, hash=000000000000000000017421e65bd66c9432edd46c5e8b70cd4c12b542610e93): did not get response before timeout
Success after lnd restart:
2023-11-18 18:53:17.040 [DBG] WTWR: Fetching block for (height=817357, hash=000000000000000000017421e65bd66c9432edd46c5e8b70cd4c12b542610e93) 2023-11-18 18:53:23.034 [DBG] WTWR: Scanning 4350 transaction in block (height=817357, hash=000000000000000000017421e65bd66c9432edd46c5e8b70cd4c12b542610e93) for breaches 2023-11-18 18:53:23.133 [DBG] WTWR: No breaches found in (height=817357, hash=000000000000000000017421e65bd66c9432edd46c5e8b70cd4c12b542610e93)
Your environment
lnd
: 0.17.0-betabtcd
,bitcoind
, or other backend: bitcoind v23.1.0 (pruned, blockfilterindex=1, peerblockfilters=1)Steps to reproduce
No idea how to easily reproduce this exact case at the moment. It would be about searching for situation when the lnd with watchtower starts, connects to the (probably neutrino) backend and than the backend is disconnected.
Expected behaviour
The watchtower retries to fetch the missing block or somehow loudly explicitly says that it gives up.
Actual behaviour
It fails to obtain the wanted block due to timeout. It is not clear if it is just a temporary or permanent status. It is difficult to do any other diagnostics.