uw-labs / proximo

Interoperable GRPC based publish/subscribe
GNU Lesser General Public License v3.0
41 stars 16 forks source link

Added run-group to ensure that the calls to gRPC endpoints terminate on error. #65

Closed SpeedyCoder closed 5 years ago

SpeedyCoder commented 5 years ago

Switching the server endpoints to use error group introduced a subtle error case when there is error in the message sink/source backend, but the gRPC connection is healthy. The goroutine that receives messages from client won't terminate as it would be waiting for new messages to arrive, because gRPC uses a different context. I added an implementation of rungroup that is just an adapted version of errgroup that cancels the underlying context as soon as one of the function returns, this removes the need for the intermediate context. Run group also allows user to run functions asynchronously, so that a call toWait won't wait for them to finish, but they still get error/termination handling. This aspect should solve the issue described above.

Finally here is a stack trace from a proximo instance that got stuck in this way https://gist.github.com/SpeedyCoder/3ca2b559f63299af44616b326d735e7e

SpeedyCoder commented 5 years ago

I'm happy to revert my changes and use the same approach as before, but I think this is a cleaner solution.

SpeedyCoder commented 5 years ago

We can probably extract rungroup to it's own package/repo, as I think it would be useful in substrate as well

SpeedyCoder commented 5 years ago

I've pulled out the rungroup implementation https://github.com/uw-labs/rungroup