sociomantic-tsunami / dhtproto

Distributed Hash Table protocol definition, client, fake node, and tests
Boost Software License 1.0
5 stars 22 forks source link

Clearly log recovery of Mirror DHT requests after node outage #130

Closed joseph-wakeling-sociomantic closed 6 years ago

joseph-wakeling-sociomantic commented 6 years ago

If there is a DHT node outage, the legacy-protocol Mirror class will log error messages like:

Info [Mirror.Listen on 127.0.0.1:9000 / channel_name] - Starting in 10000ms.
[Mirror.GetAll on 127.0.0.1:9000 / channel_name] - Finished: 127.0.0.1:9000, GetAll request, status Undefined, exception 'Connection refused -- error event reported for TimeoutFiberSocketConnection fd=55 events=Wr' @./submodules/ocean/src/ocean/io/select/selector/SelectedKeysHandler.d:255. Retrying in 10000ms.

... but when the missing DHT node reconnects, one sees no corresponding log message to acknowledge that either request has successfully resumed. In the case of the Listen request one actually sees nothing at all; in the case of the GetAll request the only hint that it has recovered is the "regular" rescheduling message:

Info [Mirror.GetAll on 127.0.0.1:9000 / channel_name] - Starting in 60000ms.

This means that from a monitoring point of view, it is not really clear (unless one is familiar with how these messages work) that either request has resumed successfully.

This has caused some confusion for new users of the Mirror class, so it might be good to add some extra logging that clarifies when requests on a node have successfully resumed.

gavin-norman-sociomantic commented 6 years ago

The difficulty is: the only way to tell when a Listen or GetAll request has started up is when it receives data. The user is (obviously) already notified of this happening, so it was considered to be kind of intuitively obvious when the requests reconnect.

What would you suggest?

joseph-wakeling-sociomantic commented 6 years ago

Hmmm. I assumed there must be some sort of notification when the connection/request was (re)established ... ?

About the user being notified of data being received: I doubt most applications are taking any notice of which data is received from which node. After all, the mirror classes, old and new, have always abstracted this information away.

gavin-norman-sociomantic commented 6 years ago

Hmmm. I assumed there must be some sort of notification when the connection/request was (re)established ... ?

In the neo client there is. Not in the legacy client. We can tell if initialising a request fails, but not if it succeeds. (Unless the request's client-facing API specifically includes a "started successfully" signal... which none of them do. And I certainly don't want to get into adding things like that now ;)

About the user being notified of data being received: I doubt most applications are taking any notice of which data is received from which node

Ah of course. I see what you mean. I'll have a quick look. If I come up with something, would you be able to test it in one of your apps?

joseph-wakeling-sociomantic commented 6 years ago

If I come up with something, would you be able to test it in one of your apps?

Sure, that should be fine.