mtth / tracing

Distributed tracing
https://hackage.haskell.org/package/tracing
BSD 3-Clause "New" or "Revised" License
24 stars 11 forks source link

Background periodic publisher doesn't recover from a network exception #8

Open coderfromhere opened 3 years ago

coderfromhere commented 3 years ago

If a trace collector is temporarily down, a background thread that tries to reach it is expected to survive flushSpans throwing ConnectionFailure:

HttpExceptionRequest Request {
  host                 = "localhost"
  port                 = 9411
  secure               = False
  requestHeaders       = [("content-type","application/json")]
  path                 = "/api/v2/spans"
  queryString          = ""
  method               = "POST"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (ConnectionFailure Network.Socket.connect: <socket: 54>: does not exist (Connection refused))
mtth commented 3 years ago

Agreed - the current behavior is not great. The background thread should fail the whole process on error (relevant read) or continue publishing.

In the meantime here are a couple suggestions to work around this:

  1. Call publish manually with adequate error handling.
  2. (Untested) Specify a custom request manager which retries on a subset of exceptions.
avanov commented 3 years ago

The background thread should fail the whole process on error (relevant read) or continue publishing.

Right, the only viable option in case of backend daemons is to carry on with (or without) delayed retrying to send the same payload again, as failing the entire process is hardly desirable. How about performing another forkIO with a retrying-only closure upon receiving a network exception? The number of retries could then be configured similarly to settingsPublishPeriod.