logstash-plugins / logstash-codec-multiline

Apache License 2.0
7 stars 31 forks source link

Fix Memory Leak issue #29

Closed guyboertje closed 8 years ago

guyboertje commented 8 years ago

Fixes #28

This should also be the base for Colin's further buftok changes.

ph commented 8 years ago

I went through the code, this is starting to be a beast of code and I really like the coding style of it.

But I think we should add unit test covering the PeriodicRunner and the Retriggabletask there is some thread safety check that we do in both of theses part that can be tricky. I checked the logic of the code and the caller and I don't see any obvious issue at first sight. But the devil is in the details. I would feel more comfortable to have test on these classes.

If I understand correctly.

ph commented 8 years ago

Also I find the PeriodicPoller and the retriggabletask a bit similar in nature? I am not sure if this could be the same class?

ph commented 8 years ago

I run the code for a few hours and I don't see any memory pressure or leaks, I will do more testing tomorrow morning to make sure the auto flush code still work as expected.

guyboertje commented 8 years ago

If I understand correctly. We have a AutoFlush/RetriggableTask which is used by the multiline codec. We have 2 PeriodicPoller, one of the will unset the AutoFlush and use record_codec_usage to reset the timer.

The AutoFlush/RetriggableTask will run when the ML codec is used in an input that has not been modified for identity mapping. The PeriodicRunner kicks in when an input uses identity mapping - yes it will unset the auto_flush_runner and use record_codec_usage to set a future timestamp. The timer is not reset per se.

Also I find the PeriodicPoller and the retriggabletask a bit similar in nature?

They are quite different. PeriodicRunner calls a method on its listener periodically i.e. the thread runs continually, RetriggerableTask is a one off task that will run in the future, but the running of the task can be pushed further into the future if more retrigger 'events' occur before the task runs. When the task is complete the thread stops. If it is retriggered when the task is running it waits until the task is done then creates a new task. So in the case that no more 'events' occur the task does not run.

However this raises an interesting edge case for read file scenarios in the file input - when all the files are read and the the identities evicted, the runner for auto_flush continues running, as does the eviction runner. Maybe in a future PR (the one that moves IMC and all the supporting classes are moved into core :smile:) we can soft-stop the runners if the map is empty. Note that now in the file input the identity is evicted when the file closes via filewatch, i.e. if close_older is set. So the eviction runner may never actually do anything.

ph commented 8 years ago

@guyboertje Thanks for taking the time to respond to my concerns, I though my original comments where related to the lack of context I had when doing the first review.

guyboertje commented 8 years ago

Maybe I should travis-ify this repo.

guyboertje commented 8 years ago

@ph - you are most welcome

guyboertje commented 8 years ago

I will add tests for Retr..Task and Per..Runner.

guyboertje commented 8 years ago

Maybe in a future PR (the one that moves IMC and all the supporting classes are moved into core :smile:) we can soft-stop the runners if the map is empty.

I tried this and got ThreadError: thread 0x12924915 tried to join itself :smiley:

ph commented 8 years ago

@guy My other local tests are successful I think its a winner.

ph commented 8 years ago

Code LGTM, would like to see tests for the new classes.

ph commented 8 years ago

@suyograo @guyboertje we have concluded that we can release it and add test later for the 2 classes.

elasticsearch-bot commented 8 years ago

Guy Boertje merged this into the following branches!

Branch Commits
master 678da5d9fd97caa0f78912512fb049458d182910