logstash-plugins / logstash-filter-grok

Grok plugin to parse unstructured (log) data into something structured.
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html
Apache License 2.0
124 stars 98 forks source link

Parse regex to a nested field #66

Open dadoonet opened 8 years ago

dadoonet commented 8 years ago

I'm trying to parse using grok filter with a regular expression. I can store the result in a flat field:

(?<permission.user.read>[r-])

It gives:

        "permission.user.read" => "r",

But I would like to store the result in a nested structure such as:

permission:
    user: 
        read:  "r"

So I tried to common convention:

(?<[permission][user][read]>[r-])

But grok failed in that case:

The error reported is: 
  invalid char in group name <[permission][user][read]>: /(?<type>[d-])(?<[permission][user][read]>[r-])(?<permission.user.write>[w-])(?<permission.user.execute>[x-])(?<permission.group.read>[r-])(?<permission.group.write>[w-])(?<permission.group.execute>[x-])(?<permission.other.read>[r-])(?<permission.other.write>[w-])(?<permission.other.execute>[x-]) (?<INT:links>(?:[+-]?(?:[0-9]+))) (?<USERNAME:user>[a-zA-Z0-9._-]+) (?<USERNAME:group>[a-zA-Z0-9._-]+) (?:\s*)(?<NUMBER:size>(?:(?:(?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))))) (?<TIMESTAMP_ISO8601:date>(?:(?>\d\d){1,2})-(?:(?:0?[1-9]|1[0-2]))-(?:(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))[T ](?:(?:2[0123]|[01]?[0-9])):?(?:(?:[0-5][0-9]))(?::?(?:(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?)))?(?:(?:Z|[+-](?:(?:2[0123]|[01]?[0-9]))(?::?(?:(?:[0-5][0-9])))))?) (?<NOTSPACE:timezone>\S+)(?<GREEDYDATA:name>.*)/m

If I do the same with a preregistered grok pattern, it works fine:

%{NUMBER:[metadata][size]}

gives:

    "metadata" => {
        "size" => "11"
    },

We should fix it or document that using nested format is not possible or document how we can use nested fields.

berglh commented 8 years ago

I also have replicated this problem with Logstash 1.5.6 and Logstash 2.10. Grok Filter and Nested Objects/Fields Ambiguity, I only just found this issue and seems like it's in a more relevant location with respect to the Grok filter plugin.

What I find strange is that if you launch Logstash with --debug you can see the semantic matches like %{PATTERN:[object][field]} being expanded similar to a the custom pattern matches.

berglh commented 8 years ago

So, turns out that the grok filter uses Oniguruma syntax. Looking into the Oniguruma syntax, you can see in L209 of the syntax guide, how the name match reflects that in the custom pattern match of the grok filter: Ruby Regex Syntax

[(?<name>subexp)]    define named group
                     (All characters of the name must be a word character.)

There you have it, it explains the invalid char in group name error in more detail by defining what characters are legal, of which the square brackets are not included. This is an upstream change and would be hard to justify to push it that far up. There may also be some regex related reasons, such as square brackets usually referencing a regex character class.

I think the only current option is to create the custom pattern file and reference it in the grok match:

Pattern File

PERMS [r-]

Grok Filter

 grok {
              patterns_dir => "/etc/logstash/conf.d/patterns"   
              match => [ "source_field", "%{PERMS:[permission][user][read]}" ]
}

Due to the requirement of no period separated fields in Elasticsearch 2.0, which is how nested fields used to be referenced, I think it might be good to disambiguate the documentation to cover this senario in detail. Additionally, the Logstsah Configuration test should probably pick this up and concisely inform you of the reason.

dnk8n commented 6 years ago

Any update on this issue? I am having difficulty with the same problem (logstash 5.6.4)

jordansissel commented 6 years ago

The syntax you are using (?<...>...) is a feature provided by the library Grok is using (Joni, a regular expression engine). The error is coming from Joni and is a report that the [ and ] characters are not allowed by Joni in a named group.

Fixing this will require a change in the Joni library (and because Joni is a Java implementation of Oniguruma, probably a feature request in Oniguruma also).

Sorry for the confusion this causes.

On Wed, Nov 29, 2017 at 4:25 AM Dean Kayton notifications@github.com wrote:

Any update on this issue? I am having difficulty with the same problem (logstash 5.6.4)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/logstash-plugins/logstash-filter-grok/issues/66#issuecomment-347845127, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIC6vG7QYK54O75Ahbca3K0pdIujk6Jks5s7U07gaJpZM4Gzg0L .

willemdh commented 6 years ago

Hey, Just found this issue.. had some troubles doing a regex capture, see https://github.com/logstash-plugins/logstash-filter-grok/issues/66

synFK commented 5 years ago

Can't we just expand a field name that contains dots – e.g. "permission.user.read" – to a nested object or would this be breaking any conventions?