Open acchen97 opened 5 years ago
I envision a two-part solution:
LogStash::Filters::Http#request_http(verb, url, options)
) is also possible if a little more complex, and would reduce the overhead of a user of this plugin configuring and running above-mentioned caching proxy, at the cost of breaking some of the semantics (e.g., no upstream cache invalidation) and some unpredictability in the plugin's memory consumption.I add my vote on this one, it would be ideal for our data enrichment use case. We are now using the jdbc_streaming filter, but it's a less-than-ideal choice. The perfect choice would be the http filter with caching capabilities, just like the aforementioned jdbc_streaming, only making HTTP calls instead of SQL queries.
+1 Just came to add my interest in this. I haven't gotten any method other than hammering my REST source with the exact same request to work.
-1
I don't think that LogStash should have a caching layer, as there is already external software (nginx, memcached) that does that well and it's easy to integrate them with LogStash.
I have two use cases for which I am using external caches:
nginx
listening on localhost that proxies my API service, configured its disk cache and pointed logstash to it (see [1] below)clientip
and user
fields, I store it in memcached. If I have a clientip
and not an user
field, I query memcached to enrich the log event (see [2] below).That said, I find the following pluses in having the caching layer external:
Sorry for the verbosity, I hope this is useful also for your use cases.
proxy_cache_path /srv/cache/foobar levels=1:2 keys_zone=foobar:40m inactive=24h max_size=1g;
server {
listen localhost:8084;
access_log off;
location / {
proxy_pass https://foobar;
proxy_ignore_headers Cache-Control;
proxy_set_header Host foobar.example.org;
proxy_buffering on;
proxy_cache foobar;
proxy_cache_key $uri$is_args$args;
proxy_cache_valid 200 404 1h;
proxy_cache_valid any 5m;
proxy_cache_lock on;
proxy_cache_use_stale error timeout invalid_header updating http_500 http_502 http_503 http_504;
add_header X-Cache-Status $upstream_cache_status;
}
}
upstream foobar {
server foobar.example.org:443;
}
# We have a mapping from the event, store it in the cache for usage by other future events.
#
if [clientip] and [user] and [user] !~ '(?:^(?:unauthenticated|_?system|anonymous|\[?unknown\]?)$)' {
memcached {
hosts => ["cache-01"]
namespace => "logstash-ip"
set => { "[user]" => "%{clientip}" }
ttl => 86400 # Avoid stale lookups
}
}
# We don't have a mapping from the event, try to look it up from the cache.
#
if [clientip] and ! [user] {
# Check the cache
#
memcached {
hosts => ["cache-01"]
namespace => "logstash-ip"
get => { "%{clientip}" => "[user]" }
add_tag => ["user_from_cache"]
}
}
There have already been some demand for native caching for HTTP lookups with this plugin. This would help enable higher throughput without the need for usage with conjunction with third-party caching systems like Memcached.
Please feel free to +1 if you are interested in this feature.