Closed yusefnapora closed 8 years ago
Okay, so after a bit of testing this morning, I've confirmed that this will reconnect if the rpc service goes down and comes back up within the retry period. The retry helper will max out at 60 seconds between attempts, but I set the default to 20 retry attempts, which should give us enough time to handle normal maintenance restarts.
One thing that worries me about this implementation is that, if the blockchain catchup was completed before the stream was interrupted, when the stream comes back up it won't re-run the catchup thread. So you could potentially miss some records if a new block was published during the downtime.
I think that can be addressed when we add the "catchup to known block" functionality, by just keeping track of the last seen block and restarting the catchup worker if need be.
Yeah using known block height sounds like the right fix
This will try to restart the journal stream if we get a recoverable GRPC error while iterating over it. It changes
BlockchainFollower
to accept a function that will open the stream, instead of the stream itself. Then in the event receiver thread, it wraps the whole "open and consume stream" process into a helper method, which it call using thewith_retry
helper. So, if it gets a recoverable error, it will try to reopen the stream and start again.This will lead to duplicate entries on the output stream, if you get disconnected partway through, since the new journal stream will start over from the beginning. We should add some bookkeeping here to track the last received block, etc. But that's its own issue that we need to tackle separately.