Closed telackey closed 4 years ago
This happens if the lite node is offline for a few minutes and the full-node prunes state in the IAVL store. There is a flag to disable pruning in full-nodes, in which case this error won't occur. We can't force full-nodes not under our control to turn on that flag, in which cases a reset is required.
I am not sure how many minutes it takes, but in practice I cannot run my local KUBE for more a week with confidence, even when there has not been any known disruption in connectivity. More rarely, I have seen this even on the cloud nodes.
I think the most tractable solution would be to make this easy to recover from, rather than impossible to happen. If we did an automatic reset and restart of the lite node when we hit this condition, for all practical purposes it would not be an issue.
(Note, if we even exited when we hit this condition, on the KUBE, the auto-restart logic would take care of the rest.)
I'm pretty sure your node went offline sometime.
"Automatic reset" is wrong for reasons too lengthy to explain in full here (short version: attacks/malicious nodes), but the least evil option is to timeout and exit if the lite node hasn't made progress in > N minutes.
Working at 18:44:42:
Broken at 18:55:16:
Nothing leaps out in the logs, but once this error begins, it is persistent. I fixed it restarting wnsd-lite with --reset.