logstash-plugins / logstash-codec-multiline

Apache License 2.0
7 stars 31 forks source link

identity map limit of 20k #25

Open colinsurprenant opened 8 years ago

colinsurprenant commented 8 years ago

What is the reasoning for hard coding the identity map limit to 20k items? isn't it possible to get to 20k identities within the cleanup timeout interval of 5 minutes? what if we process a directory which contains 20k+ files?

with the upcoming fixes to filewatch related to open files leak, will that also affect the identity map? will the new corrected file closing mechanism also evict identities?

this relates to a discussion on discuss at https://discuss.elastic.co/t/logstash-identitymapcodec-has-reached-100-capacity/40210

@guyboertje thoughts?

guyboertje commented 8 years ago

ATM, the gazillion file read use case is not supported in the code, so I guess I and the PR reviewers felt that 20000 tailed identities per input was enough.

will the new corrected file closing mechanism also evict identities?

In fact v2.1.2 of the file input shipped with LS 2.1.1 supports this already and close_older defaults to 1 hour, i.e. close-able if the file's mtime (for this version) is more that one hour ago.

But yes the changes introduced in the file input and filewatch v0.8.0 will handle the 20k+ files better. But not if a user changes the max_open_files config to > 19999. It is specifically designed to handle the many files use case. However perhaps more user guidance (blog?) is necessary on setting the configs better. In filewatch the way in which close_older is calculated has changed. Now, instead of using the file mtime to calculate whether a file can be closed accessed_at, the time the last byte was read, is used. Before, all files were opened or tried to and all were tested for closing, now only max_open_files number of files are opened and processed the rest are queued. Each file is represented by a state tracking object, and only files in the active state are considered for closing.

robin13 commented 8 years ago

Could MAX_IDENTITIES be made configurable so that it can be increased if necessary?