Closed halfbyte closed 9 months ago
I'd go ahead and implement a worker in this PR.
I'd go ahead and implement a worker in this PR.
I was asking more about the batching. :)
I was asking more about the batching. :)
Ah, yes, batching now would be fantastic. :)
I was asking more about the batching. :)
Ah, yes, batching now would be fantastic. :)
If I see this correctly the notify API does not currently batch at least in the ruby implementation? Do we have examples on how other client libraries do the batching? What kind of algorithm should we apply here?
batch.size > n
or a specific timeout?batch.size > n
is probably not enough as it could result in never sending events? To me, 2nd option sounds logical, but maybe you have something in mind already or an example of a similar implementation?
I think batching by watching the batch size and a timeout is the way to go. I'm not sure what numbers would be best, but I guess something like 200 events or 30 seconds would work.
I think batching by watching the batch size and a timeout is the way to go. I'm not sure what numbers would be best, but I guess something like 200 events or 30 seconds would work.
Agreed, so option 2. If we want to get fancy we could have a backoff algorithm for the timeout (so it would send the first few events immediately, but then gradually slow down if there's a lot of events coming through). Let's save that optimization for later, though.
Oh, I should mention that we have a max payload size on the API of 5Mb, so total request size could also be a factor for deciding when to send a batch. :)
Oh, I should mention that we have a max payload size on the API of 5Mb, so total request size could also be a factor for deciding when to send a batch. :)
@stympy Does that count for compressed payloads or for payload after decompression? If compression plays into this should I just assume a certain compression factor (which works until a client starts posting mp3's through the events API)
I also wonder if keeping track of this should actually be a concern of the backend - Which would complicate matters because suddenly we'd need two separate queues with different logic of when to actually commence the sending.
I started working implementing batching but this is far from finished and untested (wanted to get the infra in before), so I turned this back to WIP.
Does that count for compressed payloads or for payload after decompression? If compression plays into this should I just assume a certain compression factor (which works until a client starts posting mp3's through the events API)
This max is checked before the payload is decompressed.
I also wonder if keeping track of this should actually be a concern of the backend - Which would complicate matters because suddenly we'd need two separate queues with different logic of when to actually commence the sending.
I don't think you need to worry about this right now... I just wanted to ensure everyone was aware of the limit.
FYI, I made a first version for this in the javascript client. I think the 30 seconds timeout is too long though (I put 50ms but I guess that might be too short), considering we might come up with a live tail feature at some point.
Hey all, what's next for this and #512? Feel free to request fresh reviews from me when either PR is ready for another look.
@joshuap I'll spend the next two days trying to get this reviewable. I'll also take a look at the discussions on #512 to see if there's something that needs to be changed there. I have about 3 days left in terms of budget, my hope is to get at least this one merged into #512
@joshuap Could you please take a look at the EventsWorker
? I am not super confident about my multithreading chops and while this is mostly a duplication of the Worker
class, I had to thread in (haha!) a bit of extra functionality to be able to flush the newly introduced send queue (for batching). I removed code for checking the max payload for now as I didn't want to make it even more complicated.
I need to add some additional tests for batching etc., so far the events_worker_spec
more or less just checks the same thing as the worker_spec
.
Ah, just realized an important part missing here: To send events after a certain amount of time even if no new events come in :-/
Phew, that took quite a bit of refactoring. Please everyone throw some eyeballs on this, as I am quite afraid that there are some deadlocks lurking in this odd piece of code.
I had to add yet another thread to the mix to do the timeout checking - I thought long and hard about possible alternatives but I think this is the only one that does not remove the benefit of the queue as a scheduling mechanism.
I've replaced my thread variables with heavy usage of Mutex#synchronize
and had to carefully add these to make sure I don't run into deadlocks straight on (For example, wrapping the whole send code lead to a deadlock when throttling was involved)
If y'all feel that the basis of this is sound I'll throw a bit of cosmetic refactoring at it and then maybe we can merge this so that I can tackle the server interaction.
This might be useful: https://github.com/ertygiq/msg-batcher
@stympy Thanks, I'll have a look. Not sure how much is transferable, but I'll read the code and see if I can learn anything from it for this (I assume you didn't propose to directly use it)
Not sure how much is transferable, but I'll read the code and see if I can learn anything from it for this (I assume you didn't propose to directly use it)
I know Josh prefers to keep to zero dependencies, so yeah, unless it would simplify things dramatically to use it directly, I'd say just see what you can learn from it.
@joshuap Ah, thanks for catching that. I'll have a quick look if I can see an obvious solution, but I would rather merge this so I can continue to add the http transport next (Which I've been working on already).
This currently is missing a queue and calls the backend directly from the agent.Should I implement an events_worker within this PR or in the PR that adds the server backend?Another question: Would it make sense to hand in the timestamp from the agent as an actual DateTime so that we could delegate the serialisation to the specific backend? Feels like it could be useful?