logstash-plugins / logstash-filter-grok

Grok plugin to parse unstructured (log) data into something structured.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Apache License 2.0
122 stars 97 forks source link

Implement Java Version of this Plugin's Logic and Benchmark against Current Version #112

Open original-brownbear opened 7 years ago

original-brownbear commented 7 years ago

It's in the title, also see https://github.com/logstash-plugins/logstash-filter-grok/pull/111#issuecomment-300284569 for the background.

suyograo commented 7 years ago

Just as an FYI when we get to this. The original ruby (and C!) implementation from @jordansissel is here https://github.com/jordansissel/ruby-grok. We should port all the awesome tests from that library (even if it's in Ruby). Also, for reference, there is a ingest node implementation of grok from @talevy https://github.com/elastic/elasticsearch/blob/master/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/Grok.java.

Ideally, we would have a Grok library we can reuse in both ES and LS, but for now, we can have the Java implementation embedded here. Open to separating this as well from the get go. We can discuss.

jordansissel commented 7 years ago

The grok implementation in Ingest won't work for Logstash because:

I'm open to review any implementation details or to share history on grok (originally written in 2004!), if you need it.

+1 on exploring this moving to java.

My intuition is that moving the capture mapping to Java will have some nice improvement since we can keep all of that work within Java and never enter JRuby during execution (regexp + capture handling).