Open alaendle opened 6 months ago
I think the best is to expose these errors to users so they can be handled appropriately for the use-case - there isn't a one-size-fits-all handling strategy. Would https://github.com/mtth/tracing/pull/9 with an additional status check work? It exposes a separate publishSpans
function and typed exception to provide this flexibility without bundling too much logic in the constructor.
@mtth Sorry for the late response, took some time until I found a chance to think it through - from my perspective #9 alone wouldn't solve everything perfectly. Maybe I've misunderstood something, but I think the following shortcoming is there:
publish
and retries with publishSpans
in case of exceptions) - and also something similar for with
(retry logic in case of exceptions during publish
).So with that said, by adding a additional check, you could solve the issue! But I fear the price we pay is a really hard to understand API usage. From my perspective we should strive for a solution that works with the ZPK.with zpkSettings . ZPK.run
function. So I wonder if it makes sense to establish some kind of optional exception handler in the settings record (rasing the exception could be the default handler)? Here the user could really decide what to do and the operation could be resumed (by internally re-queueing the spans).
And don't get me wrong, this is by no way a critic. I'm thankful for any improvement and value your development efforts ❤️ ! So please take this just as a side note/comment - I could be wrong.
Thanks for the thoughtful reply, @alaendle. Both approaches are complementary, we can both:
For 2, we could for example add the HTTP status check and buffer spans on failure until the next retry. We'd add some state to Zipkin
(maybe a TVar Seq
, with a configurable limit to avoid OOMs). What do you think?
https://github.com/mtth/tracing/blob/aa0cc7001795b5ccf9fbc87a9fa33282bdda5779/src/Monitor/Tracing/Zipkin.hs#L123
Do we need to check the HTTP status? I'm unsure about the impact - but in my case a unreachable address resulted in a 502 Bad Gateway status.
Maybe we should check for 2XX results (statusIsSuccessful)? But what should we do in other cases? Just output a warning?