thekrakken / java-grok

Simple API that allows you to easily parse logs and other files
http://grok.nflabs.com/
Other
358 stars 152 forks source link

Changing Grok.compile() boolean namedOnly changed matchin pattern #61

Closed fbacchella closed 7 years ago

fbacchella commented 7 years ago

I tried the following code:

@Test
public void TestBug() throws ProcessorException, GrokException {
    Grok grok = new Grok();
    grok.addPattern("BOM", "\\xEF\\xBB\\xBF");
    grok.addPattern("GREEDYDATA", ".*");
    grok.addPattern("LINE", "(%{BOM}?%{GREEDYDATA:message})?");
    grok.compile("%{LINE}", false);
    Match gm = grok.match("themessage");
    gm.captures();
    System.out.println(gm.toMap());
    System.out.println("    " + gm.getMatch().pattern());
}

And get what I expected:

{BOM=null, LINE=themessage, message=themessage}
    (?<name0>((?<name1>\xEF\xBB\xBF)?(?<name2>.*))?)

But if I switch the line

    grok.compile("%{LINE}", false);

to

    grok.compile("%{LINE}", true);

The matching failed, and I get:

{message=null}
   (\xEF\xBB\xBF?(?<name2>.*))?

The matching changed, but I just changed the group wanted. But Look at the regex generated, it goes from (?<name1>\xEF\xBB\xBF)? to \xEF\xBB\xBF?. In the first case, the whole word \xEF\xBB\xBF is optional. In the second case, only the F is. Changing the value of namedOnly in compile should not changed the returned values. The pattern in the second case should be (?:((?:\xEF\xBB\xBF)?(?<name2>.*))?). The unwanted names should be replaced by ?:, not just dropping the grouping.

retoo commented 7 years ago

Even simpler case is:

        grok.addPattern("WORD", "foo|bar");
        grok.addPattern("TEXT", "<< %{WORD}+ >>");

This works for only for namedOnly=false.

On the other hand:

        grok.addPattern("WORD", "(?:foo|bar)");
        grok.addPattern("TEXT", "<< %{WORD}+ >>");

... works for both for namedOnly=false and false