fix(threading): account for secondary buffer during flush

logdna / python

A python package for sending logs to LogDNA

MIT License

48 stars 35 forks source link

fix(threading): account for secondary buffer during flush #75

Closed esatterwhite closed 3 years ago

esatterwhite commented 3 years ago

In the case that work cannot be scheduled on the worker pool, logs are pushed onto the secondary list and the check before a send doesn't account for any logs that may be pending in the secondary.

This can happen in the case of a run time error when attempting to submit work to one of the thread pools effectively leaving logs unsent.

fixes: #74

dbartenstein commented 3 years ago

Hi all! Any idea when this fix will be merged?

esatterwhite commented 3 years ago

I'm still seeing a situation where it will stop flushing after some time. I think there is an error being swallowed in a thread. digging

esatterwhite commented 3 years ago

@matt-march @jakedipity I think I tracked down the deadlock. There was an error case that would result in a lock not being released and eventually all threads would be locked.

I was able to log successfully to the dev environment for several hours. previously, it would lock up after a few minutes. Please take a good look

esatterwhite commented 3 years ago

Also because the flush loop is really just a 1-shot timeout that is manually restarted, I am pretty sure there was a situation where It would try to schedule some work on a thread and there was enough time spent in a context switch w/ an error that the timer would trigger and it would go un-noticed - and none of code paths would know to restart it.

Also resulting in "nothing happening".