Open prydin opened 5 years ago
I would like to go ahead and make this change.
I am thinking:
Flush()
in the SDK today, and understand which of those want to flush a batch and which would be ok with Flush()
in the traditional sense.FlushBatch() or flushBatch()
, and call that where appropriateFlush()
to do a full flushadditional thoughts from talking with @priyaselvaganesan and looking at the equivalent code in the Java SDK.
It looks like the flush()
method in the Java SDK may also be only flushing batches, not performing a full flush.
We wonder about the safety of performing a full flush. What if the SDK is sending to a proxy that is already overwhelmed? Will the proxy throttle those requests? Will it create backpressure? Would we want flush to block indefinitely if the proxy is making us wait? Or would this not happen because the SDK buffer can only be so big anyway?
What if the flush takes a really long time? Should there be a timeout?
Thinking about renaming Flush
to FlushBatch
, with the current semantics, and possibly adding a full Flush
option later (possibly with a context parameter so we can enforce a timeout).
Another thought:
I'm pretty sure Flush()
will re-queue failed metrics that it thinks will be sendable in future batches. So if we naively made FlushAll()
by repeatedly calling the current Flush()
until the buffer was empty, I think we could get into an infinite loop.
Maybe failed lines need a retry limit. Maybe FlushAll()
has a different implementation from Flush()
, and doesn't re-queue metrics that failed to send.
When DirectSender.Flush() is called, it calls
Flush()
on the underlyingLineHandlers
.LineHandler.Flush()
only performs a flush of a single batch, leaving the rest of the buffer in memory. Typically,Flush()
on something means that you'd block until all data is written to some backend storage. I suggest we change the code to callFlushAll()
on the LineHandlers instead, since this would be more in line with typical semantics of aFlush()
call.If we are worried about flooding Wavefront with points, we may want to offer a
FlushPartial()
orFlushCurrentBatch()
, but, in my opinion, it's confusing to haveFlush()
do something that's not inline with what programmers typically expect.Even if we don't change the behavior of
Flush()
the senders need to offer aFlushAll()
. This is crucial if you have a data collector that doesn't run continuously, e.g. something that's started from cron, does its job and then dies. Such a program needs a way to ensure that everything is written to the backend before it exits.