log4mongo / log4mongo-python

python logging handler for mongo database
http://log4mongo.org
Other
111 stars 37 forks source link

Batch insert #21

Closed ronhanson closed 8 years ago

ronhanson commented 8 years ago

Hi, To avoid write-locking the mongo DB with constant log writes, I'd like to implement some kind of "batch insert". Something like buffering the logs and writing them every 10 lines or so. And maybe periodically emptying the buffer too. Has this already been voiced or tried by anyone? Just checking if the idea is valid or if I'm being stupid. Cheers, Ronan

oz123 commented 8 years ago

The idea is valid, and I have implemented something like that. Unfortunately it is not open source (done at a previous work place).

ronhanson commented 8 years ago

Great. I'll try to implement something I guess then. As you've got experience, best to use a thread to call periodically the buffer write (every half second or so) or best to write when buffer is full? Second idea is valid but it means that if the logging script stops emitting for a while, you will not see the last lines stored in the buffer, until the buffer is filled, full again and then written to db... First idea is very easy to implement with a thread but comes at the cost of it.

Any recommendation?

oz123 commented 8 years ago

The way I did last time was without thread. There was the following mechanism:

A logger class with buffer (number of message stored) which abused the memory if you have too many messages. If the buffer size was reached, it wrote all message. It was also possible to discard all messages if a critical message didn't appear. So basically all the messages where simply erased from the buffer, if after a while no critical message arrived (this is kind of like buffering of top of the buffering that the logging library already gives you). The reason for this was that most of the time our application ran in debug mode. So we went with you second idea, except that urgent messages, critical message caused the buffer to flush and write all messages to the database.

ronhanson commented 8 years ago

The idea to force flush the buffer on critical/urgent message is very bright indeed.

Although with that way of doing things, if the app paused logging for a while, the buffer was not flushed at all? So by looking at the logs stored in the DB, you could not see what the lastest messages are. That might be a problem for most common use cases... Hence the thread solves that problem.

I'll definitively draft something, possibly a mixed solution.

oz123 commented 8 years ago

Another way, which I didn't think of is to use a time based flushing. So we used an N-Message flush. If the buffer reached N messages flush. But you could set a timer. So even if N has not reached, you force the buffer to write after Y seconds (which of course be minutes or any time unit). This way even if the application paused you could flush the messages. Of course, there will be a maximum of N messages missing from the database for a max of Y time units. But if no urgent messages are there, nobody cares.

ronhanson commented 8 years ago

Cool, that is exactly what I meant with the thread solution (I might not have been clear enough in my previous messages), it is used only as a timer to call periodically the flush (I don't know how we could do timers without threads). Anyway, that is easy enough to implement as an option of the handler.

oz123 commented 8 years ago

This was merged. Cool work!