thekrakken / java-grok

Simple API that allows you to easily parse logs and other files
http://grok.nflabs.com/
Other
358 stars 152 forks source link

NamedRegexCollection with namedOnly #100

Closed RickyHuo closed 6 years ago

RickyHuo commented 6 years ago

Is there a way to get NamedRegexCollection with namedOnly?

Can this code works if I modify like this ?

if (namedOnly && group.get("subname") == null) {
    replacement = String.format("(?:%s)", definitionOfPattern);
    namedRegexCollection.put("name" + index, group.get("name"));
}
namedRegex =
    StringUtils.replace(namedRegex, "%{" + group.get("name") + "}", replacement,1);
ottobackwards commented 6 years ago

Can you provide some more context? What are you trying to accomplish?

RickyHuo commented 6 years ago

I have the following code:

public static void testgetNamedRegexCollection(String avg[]) throws Exception{

        GrokCompiler compiler = GrokCompiler.newInstance();

        compiler.register("BOM", "\\xEF\\xBB\\xBF");
        compiler.register("GREEDYDATA", ".*");
        compiler.register("LINE", "(%{BOM:bom}?%{GREEDYDATA:message})?");
        Grok grok = compiler.compile("%{LINE:line}", true);
        Match gm = grok.match("themessage");
        System.out.println(grok.getNamedRegexCollection());
    }

And the output is

{name2=message, name1=bom, name0=line}

What I excepted is:

{name0=line}

Have line without bom and message only if namedOnly is true

ottobackwards commented 6 years ago

This output is correct though. You may only be using LINE, but many grok patterns are composites, and there may be many arbitrary combinations of named regex's used in a system.

Those are named regexes.

What are you trying to do that you only want the LINE? As far as I can see this shouldn't be changed. @anthonycorbacho ??

anthonycorbacho commented 6 years ago

I am not sure to understand what is needed here?

On Fri, 11 May 2018 at 8:43 PM Otto Fowler notifications@github.com wrote:

This output is correct though. You may only be using LINE, but many grok patterns are composites, and there may be many arbitrary combinations of named regex's used in a system.

Those are named regexes.

What are you trying to do that you only want the LINE? As far as I can see this shouldn't be changed. @anthonycorbacho https://github.com/anthonycorbacho ??

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thekrakken/java-grok/issues/100#issuecomment-388340361, or mute the thread https://github.com/notifications/unsubscribe-auth/AC_n5XaZA9dEsO72YL0i3Rm91t9QT5z7ks5txXlXgaJpZM4T59nI .

ottobackwards commented 6 years ago

I think @RickyHuo differentiates between the 'top' level pattern he uses LINE and the patterns that are present in it's composition, so he only expects to see LINE as a named regex.

In truth they are all named regex and the current behavior is correct

RickyHuo commented 6 years ago

@ottobackwards @anthonycorbacho Sorry,I made a mistake.

I have changed my code:


public static void testgetNamedRegexCollection(String avg[]) throws Exception{

        GrokCompiler compiler = GrokCompiler.newInstance();

        compiler.register("BOM", "\\xEF\\xBB\\xBF");
        compiler.register("GREEDYDATA", ".*");
        compiler.register("LINE", "(%{BOM}?%{GREEDYDATA})?");
        Grok grok = compiler.compile("%{LINE:line}", true);
        Match gm = grok.match("themessage");
        System.out.println(grok.getNamedRegexCollection());
    }
And the output is

{name2=message, name1=bom, name0=line}
What I excepted is:

{name0=line}
Have line without bom and message only if namedOnly is true
ottobackwards commented 6 years ago

getNamedRegexCollection returns the named regex that the grok knows about. Again, it is correct because it knows about all of those things. This is working correctly.

what are you trying to accomplish? You are not going about it the right way.

RickyHuo commented 6 years ago

@param namedOnly : Whether to capture named expressions only or not (i.e. %{IP:ip} but not ${IP})

I think if I set namedOnly to true. It will only capture "line"

Grok grok = compiler.compile("%{LINE:line}", true);

The output of getNamedRegexCollection is correct, I konw.

But I think {name0=line} would be better.

ottobackwards commented 6 years ago

No, capture means what is the result of calling the .capture() method. The getNamedRegexCollection is a call to return the internals of the grok. it is not going change, it is correct.

 @Test
  public void test006_captureOnlyNamed() throws GrokException {
    compiler.register("abcdef", "[a-zA-Z]+");
    compiler.register("ghijk", "\\d+");
    Grok grok = compiler.compile("%{abcdef:abcdef}%{ghijk}", true);
    Match match = grok.match("abcdef12345");
    Map<String, Object> map = match.capture();
    assertEquals(map.size(), 1);
    assertNull(map.get("ghijk"));
    assertEquals(map.get("abcdef"), "abcdef");
  }

from the tests.

RickyHuo commented 6 years ago

@ottobackwards Thanks for your reply.

ottobackwards commented 6 years ago

@RickyHuo don't forget https://groups.google.com/forum/#!forum/java-grok, you can ask questions there too if you need help!