logstash-plugins / logstash-filter-memcached

Memcached Filter plugin for Logstash
Apache License 2.0
4 stars 11 forks source link

v1.2.0 Enhancement: Introduce optional LRU cache for massive performance improvement (10x) #32

Open cameronkerrnz opened 3 years ago

cameronkerrnz commented 3 years ago

Release notes

Massive performance improvement, adding an (optional) thread-local LRU cache.

What does this PR do?

Introduces a thread-local, optional, off-by-default least-recently-used (LRU) cache using the already well-used lru_redux Ruby module, as used by the likes of logstash-filter-dns. Includes performance testing collateral showing improvement from about 5k events per second to 50-60keps when run in a simple Docker container running Logstash 1.13.1

Why is it important/What is the impact to the user?

I am wanting to enrich syslog messages from all of many network devices with information about those network devices as recorded in our network management system, such as which building / area of the network (distribution, core, access, wireless, etc.) and more besides. This will make it so much easier for our network engineers to find logs of interest, and to make useful aggregations/filters in Kibana/Elasticsearch. The data from the network management system to loaded into Memcached (Redis would have been preferable, but there is no logstash-filter-memcache (yet, perhaps I shall contribute one).

As part of monitoring the pipeline, I use logstash_exporter, Prometheus and Grafana, and when I put this into production, I rapidly found that it was slower than the slowest part of my pipeline (output to Elasticsearch), and was unable to keep up (Kakfa consumer-group lag continued to increase). Anectodal experience suggests that response time to memcached was scaling super-linear.

To retain some insight into performance I had to implement a throttle so that only some of the messages went out via logstash-filter-memcached, which doesn't realise value, but does allow me to realise that I can safely handle at most 10,000 events per 10 seconds before I'm at risk of an unsustainable logging.

    throttle {
      id => "networking.date.35"
      before_count => 10000
      period => 10
      max_age => 20
      key => "any"
      add_tag => [ "pass_to_memcache" ]
    }

    if "pass_to_memcache" in [tags] {
      memcached {
        id => "networking.memcached.32"
        hosts => ["127.0.0.1"]
        namespace => "network_devices"
        get => {
          "%{syslog-host-ip}" => "[@metadata][network_devices_temp]"
        }
      }
      json {
        id => "networking.json.41"
        source => "[@metadata][network_devices_temp]"
        target => "source_device"
      }
    }

Given that a very large proportion of the logs come from a relative minority of devices (eg. distribution switches and wireless controllers), a LRU cache is an obvious optimisation, and I have attempted to implement this in much the same way, with the exception that I make each thread have its own cache to reduce contention, which offers 200% performance in itself.

Checklist

I've only done some rudimentary rspec prior; would appreciate guidance on how to approach adding some appropriate tests for this PR. I have added performance testing tools and results in the 'qa' folder.

Author's Checklist

How to test this PR locally

Easiest using Docker:

docker build qa -t logstash-filter-memcached:qa
docker run --rm -it logstash-filter-memcached:qa

Running this container will run the performance test, which will stop by itself after 5 minutes. Suggest you ignore the per-event timings (the source data is from the logstash REST API, which measure in milliseconds); use the events-per-second instead. Findings are in Performance.md

For development, I use my devcontainer effort (Dockerhub cameronkerrnz/logstash-plugin-dev:7.13). If you have VSCode installed, it will offer to launch you into this for you.

Related issues

Use cases

Events-per-second for GET requests will be able to use an in-process thread-local cache. This will enable much faster processing speed and reduce accesses to memcache. Functionally, the plugin remains the same (assuming data in memcache does not change more frequently than LRU TTL).

Screenshots

Logs

See qa/Performance.md for details, but essentially, before:

5168.78 μs per event, 5.1 keps 3451.08 μs per event, 4.9 keps 3579.53 μs per event, 5.4 keps 4080.44 μs per event, 2.2 keps 8738.50 μs per event, 2.0 keps

After:

... 59.9 keps ... 57.2 keps ... 57.8 keps

cameronkerrnz commented 3 years ago

I've had this in production for much of a day so far, including at least half of day during the peak usage period. Here's what it looks like from a metrics monitoring point-of-view (this data sourced using logstash_exporter)

image

The blue line shows when I activated the change to use the LRU cache. On the top-graph, the green line shows the sum total of time spent per second executing this instance of the memcached filter. (so it could in theory reach a max of num_pipeline_workers). The yellow line (ignore the red dots; those relate to non-matches in other modules) shows the number of events flowing through the pipeline.

In terms of how this relates to the performance of other things going through the pipeline, it has dropped off the top-5 slowest plugins [text target is improving the performance of a particular grok filter].