Stable changes for V3.0

Anyesh commented 2 years ago

Added background submission
Added ndjson batching
Arrange middlewares
Added optional custom fields on submit

monrax commented 2 years ago

I have yet to test it and measure performance, but it looks good! I think there's still the need to add another worker thread to handle the POST request itself, though. We don't expect it to add any significant overhead, but it is still a network I/O operation that should happen in the background (while payloads are being added to the queue but also while they are being processed). In terms of adding a bound to the queue, perhaps we should check its size instead of using a counter variable without a lock. I would suggest using the maxsize argument to initialize the queue itself, but I don't think we want to block at any point.

monrax commented 2 years ago

I think creating new threads might be too expensive compared to just leaving one daemon thread running since logger initialization. I believe the Queue.get method will block only the worker thread and not the main thread, so while there's no new submissions (and all previous ones have been bundled and sent) that thread will just sleep until there's a new msg to submit.

Maybe we can have two queues for each worker thread: an unbounded one where we will put payloads to be processed (compression, JSON serialization) by a worker thread, and one bounded queue where that same worker will put messages in while the other worker thread bundles them and handles the I/O operation. The goal would be that the process of putting payloads is never blocked, but the process of putting processed messages to bundle and send is, so that it would pause when the max bundle size is reached (while that is happening the payload would just pile on the unbounded queue). I can work on a first version of how that would look like.

Lastly, is there a reason for us to create a dictionary with the submission url if self.url is already available from the worker thread? (Same with self.skip_compression, and self.agent and the logger version for the headers)

monrax commented 2 years ago

I just pushed some changes to a new branch here: https://github.com/resurfaceio/logger-python/commit/0cd8193638989f85d8965e20725142c982e653c2 I ended up not using the two queues, but instead just using a list as a buffer. I wanted to create the dispatcher thread when the logger is initialized, and change its internal loop to an inifinite loop so that it would sleep when the queue is empty. However, we would need to find a way to "flush" the buffer list before that happens.

As I'm writing this, I realize that a similar result could be achieved by just not checking for the "submission_thread" worker thread existence after putting in the payload, and instead using submit as a dispatcher of a limited amount of worker threads that would consume from the same queue, creating their own batches and submitting them accordingly (I think because of this condition I initially assumed the __internal_submission method to be a dispatcher instead of a network I/O worker thead).

One change introduced in this commit is doing the compression after it is all bundled instead of compressing each message and then joining them together (after thinking about it I figured it would be best to do it that way since compression works better as the amounts of data to look for patterns and similarities in increases).

Anyesh commented 2 years ago

I just pushed some changes to a new branch here: 0cd8193 I ended up not using the two queues, but instead just using a list as a buffer. I wanted to create the dispatcher thread when the logger is initialized, and change its internal loop to an inifinite loop so that it would sleep when the queue is empty. However, we would need to find a way to "flush" the buffer list before that happens.

As I'm writing this, I realize that a similar result could be achieved by just not checking for the "submission_thread" worker thread existence after putting in the payload, and instead using submit as a dispatcher of a limited amount of worker threads that would consume from the same queue, creating their own batches and submitting them accordingly (I think because of this condition I initially assumed the __internal_submission method to be a dispatcher instead of a network I/O worker thead).

One change introduced in this commit is doing the compression after it is all bundled instead of compressing each message and then joining them together (after thinking about it I figured it would be best to do it that way since compression works better as the amounts of data to look for patterns and similarities in increases).

That commit looks good but could you please check why unit tests are not passing?

resurfaceio / logger-python

Stable changes for V3.0 #67