Enhancement: Use multi_filter to better batch gets/sets

yaauie commented 5 years ago

A request on my original skunkworks version of this plugin requested batching, which could be achieved using multi_filter:

Batching may help improve performance. If batching 100 log messages at a time, we can potentially do a multi-get on the unique fields, then enrich the messages with the results.

Reference: https://groups.google.com/forum/#!topic/memcached/XIYlDaiPv_c

-- https://github.com/yaauie/logstash-filter-memcached-archived/issues/1

cameronkerrnz commented 3 years ago

To compare possible performance gain, I noticed that Logstash was about to achieve a similar key/second rate compared to Python using the pymemcache (which is pure python text protocol, IIRC).

Comparing the pymemcache get versus get_many (doing just 10 keys at a time), I see roughtly the following difference

get (no batching) 11,006 keys/s 87.5µs/key

get_multi (batch of 10 keys) 69,618 keys/s 13.75µs/key

I also tried using a Unix domain socket (without batching), and the performance was very similar to TCP. Would be nice to have UDP perhaps.

I'm aiming to pump in excess of 20key/s for my use-case (tag syslogs from network devices with information about the network devices as extracted from network management system).

cameronkerrnz commented 3 years ago

Another potentially significant issue is the various logstash workers will be contending for access. The client is thread-safe, so will serialise access (we're not using any connection-pooling functionality). There is an example of how this has been done in https://github.com/logstash-plugins/logstash-filter-fingerprint/commit/5fe864d6067f86024941fb0994e44c675d9ede0a (to make an OpenSSL object thread-local)

cameronkerrnz commented 3 years ago

Can someone please review if this plan seems workable?

I'm about to embark on some exploratory performance improvements for logstash-filter-memcached, because I'm finding the performance (~5k eps) is rather below what I was hoping for (> 20k eps). This is when talking to a local memcached over tcp on 127.0.0.1 (performance is much the same when talking over Unix domain socket; in case anyone was wondering). Experiments with comparing a python client with a single versus batched benchmark informs me that I could expect a 3-4 times performance boost with batching introduced.

As far as I can tell at present, there doesn't appear to be anything in the logstash API that would support such batching cleanly (I would love to be corrected on that). Thinking how I might implement this in the filter plugin API, I came up with the following ideas, inspired by how logstash-filter-split operates.

Line numbers reference inspiration from either of:

1) the filter would maintain a thread-local (avoiding lock contention) list of events in its current batch (Fingerprint L160-162) 2) if the batch is not full, then add the event to the batch and cancel the event (Split L98) 3) else if the batch is full, then: 3.1) formulate (looking at do_get method) memcache_keys to be from all events in the list, building a set (ie. free of duplicates) collection of keys to query. 3.2) Perform the query to memcached. 3.3) For each event in the batch, add the relavant response data to the event; (no need to event.clone, I think); then call filter_matched on the event and yield it.

Another (probably hugely signiificant in my environment) would be to implement some local in-thread cache to cache the top-N most frequently returned results. Why cache a cache? Simple: to reduce network and context-switching latencies, and to account that in my instance a very significant amount of the lookups are due to a relatively small number of keys. But I don't want to investigate this first because by virtue of batching operations (and requesting the unique set of keys) we would expect to see a reduction in huge reduction due to these elephant flow lookups.

Alternatively, perhaps memoizing the query would fix up much of these issues anyway, while keeping the code much(?) simpler and without introducing the whole batching semantics. There is a good exemplar of caching for a pure-Ruby plugin in logstash-filter-dns (uses lru_redux). https://github.com/logstash-plugins/logstash-filter-dns/blob/master/lib/logstash/filters/dns.rb#L74-L106

... I think I've successfully talked myself into persuing caching over batching.

Hopefully I'll have a pull request prepared in the next week or so.

logstash-plugins / logstash-filter-memcached

Enhancement: Use multi_filter to better batch gets/sets #7