v1.2.0 Enhancement: Introduce optional LRU cache for massive performance improvement (10x)

Release notes

Massive performance improvement, adding an (optional) thread-local LRU cache.

What does this PR do?

Introduces a thread-local, optional, off-by-default least-recently-used (LRU) cache using the already well-used lru_redux Ruby module, as used by the likes of logstash-filter-dns. Includes performance testing collateral showing improvement from about 5k events per second to 50-60keps when run in a simple Docker container running Logstash 1.13.1

Why is it important/What is the impact to the user?

I am wanting to enrich syslog messages from all of many network devices with information about those network devices as recorded in our network management system, such as which building / area of the network (distribution, core, access, wireless, etc.) and more besides. This will make it so much easier for our network engineers to find logs of interest, and to make useful aggregations/filters in Kibana/Elasticsearch. The data from the network management system to loaded into Memcached (Redis would have been preferable, but there is no logstash-filter-memcache (yet, perhaps I shall contribute one).

As part of monitoring the pipeline, I use logstash_exporter, Prometheus and Grafana, and when I put this into production, I rapidly found that it was slower than the slowest part of my pipeline (output to Elasticsearch), and was unable to keep up (Kakfa consumer-group lag continued to increase). Anectodal experience suggests that response time to memcached was scaling super-linear.

To retain some insight into performance I had to implement a throttle so that only some of the messages went out via logstash-filter-memcached, which doesn't realise value, but does allow me to realise that I can safely handle at most 10,000 events per 10 seconds before I'm at risk of an unsustainable logging.

    throttle {
      id => "networking.date.35"
      before_count => 10000
      period => 10
      max_age => 20
      key => "any"
      add_tag => [ "pass_to_memcache" ]
    }

    if "pass_to_memcache" in [tags] {
      memcached {
        id => "networking.memcached.32"
        hosts => ["127.0.0.1"]
        namespace => "network_devices"
        get => {
          "%{syslog-host-ip}" => "[@metadata][network_devices_temp]"
        }
      }
      json {
        id => "networking.json.41"
        source => "[@metadata][network_devices_temp]"
        target => "source_device"
      }
    }

Given that a very large proportion of the logs come from a relative minority of devices (eg. distribution switches and wireless controllers), a LRU cache is an obvious optimisation, and I have attempted to implement this in much the same way, with the exception that I make each thread have its own cache to reduce contention, which offers 200% performance in itself.

Checklist

[x] My code follows the style guidelines of this project
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[ ] ~~I have made corresponding change to the default configuration files (and/or docker env variables)~~
[partly] I have added tests that prove my fix is effective or that my feature works

I've only done some rudimentary rspec prior; would appreciate guidance on how to approach adding some appropriate tests for this PR. I have added performance testing tools and results in the 'qa' folder.

Author's Checklist

[ ] The level of rspec this filter uses is outside of my experience, and I would appreciate some help with that.
[ ] I've also taken the liberty of bumping the version to 1.2.0
[ ] This is my first PR... hopefully there is some way to cherry-pick the individual commits.

How to test this PR locally

Easiest using Docker:

docker build qa -t logstash-filter-memcached:qa
docker run --rm -it logstash-filter-memcached:qa

Running this container will run the performance test, which will stop by itself after 5 minutes. Suggest you ignore the per-event timings (the source data is from the logstash REST API, which measure in milliseconds); use the events-per-second instead. Findings are in Performance.md

For development, I use my devcontainer effort (Dockerhub cameronkerrnz/logstash-plugin-dev:7.13). If you have VSCode installed, it will offer to launch you into this for you.

Related issues

Relates #7

Use cases

Events-per-second for GET requests will be able to use an in-process thread-local cache. This will enable much faster processing speed and reduce accesses to memcache. Functionally, the plugin remains the same (assuming data in memcache does not change more frequently than LRU TTL).

Screenshots

Logs

See qa/Performance.md for details, but essentially, before:

5168.78 μs per event, 5.1 keps 3451.08 μs per event, 4.9 keps 3579.53 μs per event, 5.4 keps 4080.44 μs per event, 2.2 keps 8738.50 μs per event, 2.0 keps

After:

... 59.9 keps ... 57.2 keps ... 57.8 keps

logstash-plugins / logstash-filter-memcached