thekrakken / java-grok

Simple API that allows you to easily parse logs and other files
http://grok.nflabs.com/
Other
360 stars 151 forks source link

Don't like match that start with a " #64

Closed fbacchella closed 7 years ago

fbacchella commented 7 years ago

The following code: String pattern = "(?client id): (?.*)"; String input = "client id: \"name\" \"Mac OS X Mail\" \"version\" \"10.2 (3259)\" \"os\" \"Mac OS X\" \"os-version\" \"10.12.3 (16D32)\" \"vendor\" \"Apple Inc.\"";

    // Validate the search is good
    Pattern p = Pattern.compile("(?<message>client id): (?<clientid>.*)");
    Matcher m = p.matcher(input);
    if (m.matches()) {
        System.out.println(m.group("clientid"));
    }

    io.thekraken.grok.api.Grok grok = new io.thekraken.grok.api.Grok();
    grok.compile(pattern, false);

    Match gm = grok.match(input);
    gm.captures();
    System.out.println(gm.toMap().get("clientid"));
    System.out.println(gm.getMatch().group("clientid"));

output:

"name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc."
name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc.
"name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc."

Notice who gm.toMap().get("clientid") eats the first " although the java matcher is good

anthonycorbacho commented 7 years ago

@fbacchella thank you for logging this issue, i will take a look as soon as i can, but these days i am a bit busy with my daily job.

fbacchella commented 6 years ago

The problem is still here:

       String pattern = "(?<message>client id): (?<clientid>.*)";
        String input = "client id: \"name\" \"Mac OS X Mail\" \"version\" \"10.2 (3259)\" \"os\" \"Mac OS X\" \"os-version\" \"10.12.3 (16D32)\" \"vendor\" \"Apple Inc.\"";

        // Validate the search is good
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(input);
        if (m.matches()) {
            System.out.println(m.group("clientid"));
        }

        GrokCompiler grokCompiler = GrokCompiler.newInstance();
        grokCompiler.registerDefaultPatterns();

        io.krakens.grok.api.Grok grok = grokCompiler.compile(pattern, true);

        Match gm = grok.match(input);
        Map<String, Object> captures = gm.capture();
        System.out.println(captures.get("clientid"));
        System.out.println(gm.getMatch().group("clientid"));

Still output:

"name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc."
name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc.
"name" "Mac OS X Mail" "version" "10.2 (3259)" "os" "Mac OS X" "os-version" "10.12.3 (16D32)" "vendor" "Apple Inc."

The first " is missing in the map returned by gm.capture() (line 2), but not from the java's matcher (line 1 and 3).

ottobackwards commented 6 years ago

the cleanString() function explicitly does exactly this.

ottobackwards commented 6 years ago

I have a pr