Open rixmann opened 8 years ago
Here is a fix for this crash: https://github.com/travelping/hello/commit/bb7943cd4f14e4254fb8200adb5d4e063424c096. Please update hello.
I will look more into reconnecting client after context disappeared.
but that is just changing crash->errorlog, while not touching the reason
@thz yes, I just noticed that here isn't the last hello.
hhm. was such a robust behavior not one of the reasons to have something like hello. The underlaying tools like zeroMQ or Erlang distribution or lately http2 give you already the building blocks. I.e. ZeroMQ would reconnect sockets after TCP failure and Erlang Node Communication kann also be configured with more aggressive connectivity checks. And as far i know someone has implemented hello echos above the transport for exactly this reasons.
Hello doesn't have any subscription mechanism. Here is just a notify method which allows to send message from server to client. In this case subscription mechanism is a part of application layer. As improvements for implementing good pubsub over hello with reused socket I may suggest add feedback for notify method. If happened something wrong you will know about that and be able to remove subscription in your application.
When replying asynchronously from a hello server, the server context disappears (due to network failure). For subscriptions to work reliable this mechanism has to become more robust.
currently on the server side this error message is observed when trying to push a message to a client where the server context was removed by hello:
in case of network failure it would be ideal if hello could cash the messages to be send and only discard the context after the connection is long dead (to be determined by heartbeats?).
the client must also identify the dead connection after a similar timeout, then crash the hello client (which is to be restarted by a supervisor).