thekrakken / java-grok

Simple API that allows you to easily parse logs and other files
http://grok.nflabs.com/
Other
358 stars 152 forks source link

cannot parse this log #5

Closed loachli closed 10 years ago

loachli commented 11 years ago

1 The log is as follows and grok cannot parse this log.

10.192.1.47 - - [23/May/2013:10:47:40] "GET /flower1_store/category1.screen?category_id1=FLOWERS HTTP/1.1" 200 10577 "http://mystore.abc.com/flower1_store/main.screen&JSESSIONID=SD1SL10FF3ADFF3" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070223 CentOS/1.5.0.10-0.1.el4.centos Firefox/1.5.0.10" 3823 404

2 I add a new pattern in base: "HTTPDATE1 %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME}" errors are as follows: java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence near index 371 (((?:(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(.?|\b))|((?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})...)(?![0-9]))))(?::(\b(?:[1-9][0-9])\b))?) - - (((?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]))/(\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b)/((?>\d\d){1,2}):((?!<[0-9])((?:2[0123]|[01][0-9])):((?:[0-5][0-9]))(?::((?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?)))(?![0-9]))) 200 10577 "(([A-Za-z]+(+[A-Za-z+]+)?)://(?:(([a-zA-Z0-9-]+))(?::[^@])?@)?(?:(((?:(\b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))(.?|\b))|((?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})...)(?![0-9]))))(?::(\b(?:[1-9][0-9])\b))?))?(?:(._?\S+))?) "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070223 CentOS/1.5.0.10-0.1.el4.centos Firefox/1.5.0.10" 3823 404 ^ at java.util.regex.Pattern.error(Pattern.java:1713) at java.util.regex.Pattern.escape(Pattern.java:2177) at java.util.regex.Pattern.range(Pattern.java:2338) at java.util.regex.Pattern.clazz(Pattern.java:2268) at java.util.regex.Pattern.sequence(Pattern.java:1818) at java.util.regex.Pattern.expr(Pattern.java:1752) at java.util.regex.Pattern.compile(Pattern.java:1460) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at com.google.code.regexp.Pattern.buildStandardPattern(Unknown Source) at com.google.code.regexp.Pattern.(Unknown Source) at com.google.code.regexp.Pattern.compile(Unknown Source) at com.nflabs.Grok.Grok.compile(Grok.java:203) at com.nflabs.Grok.LzbGrokeTest.testGrok(LzbGrokeTest.java:32) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

anthonycorbacho commented 11 years ago

Hi, That's wierd, I cannot reproduce your issue. Can you send me how you use grok?

I tried this:

    Grok g = new Grok();
    g.addPatternFromFile("patterns/base");
    g.compile("%{APACHE}");
    Match gm = g.match("10.192.1.47 - - [23/May/2013:10:47:40] \"GET /flower1_store/category1.screen?category_id1=FLOWERS HTTP/1.1\" 200 10577 \"http://mystore.abc.com/flower1_store/main.screen&JSESSIONID=SD1SL10FF3ADFF3\" \"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070223 CentOS/1.5.0.10-0.1.el4.centos Firefox/1.5.0.10\" 3823 404");
    gm.captures();
    //See the result
    System.out.println(gm.toJson());

I also change patterns/base with:

    HTTPDATE1 %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME}
    APACHE_2 %{IPORHOST} %{USER:hyphen} %{USER:user} \[%{HTTPDATE1:timestamp}\] %{QUOTEDSTRING:query} %{NUMBER:response} (?:%{NUMBER:bytes}|-)(?: %{QUOTEDSTRING:Referer} %{QUOTEDSTRING:agent})?

And I get:

    {
    "APACHE": "10.192.1.47 - - [23/May/2013:10:47:40] \"GET /flower1_store/category1.screen?category_id1\u003dFLOWERS HTTP/1.1\" 200 10577 \"http://mystore.abc.com/flower1_store/main.screen\u0026JSESSIONID\u003dSD1SL10FF3ADFF3\" \"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070223 CentOS/1.5.0.10-0.1.el4.centos Firefox/1.5.0.10\"",
    "BASE10NUM": 10577,
    "HOSTNAME": "10.192.1.47",
    "HOUR": 10,
    "IPORHOST": "10.192.1.47",
    "MINUTE": 47,
    "MONTH": "May",
    "MONTHDAY": 23,
    "Referer": "http://mystore.abc.com/flower1_store/main.screen\u0026JSESSIONID\u003dSD1SL10FF3ADFF3",
    "SECOND": 40,
    "TIME": "10:47:40",
    "USERNAME": "-",
    "YEAR": 2013,
    "agent": "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070223 CentOS/1.5.0.10-0.1.el4.centos Firefox/1.5.0.10",
    "bytes": 10577,
    "hyphen": "-",
    "query": "GET /flower1_store/category1.screen?category_id1\u003dFLOWERS HTTP/1.1",
    "response": 200,
    "timestamp": "23/May/2013:10:47:40",
    "user": "-"
    }

Also what version of java are you using?

loachli commented 11 years ago

1 I do as your steps and it is ok now, thanks. Version of grok is "Grok-0.0.3.1-SNAPSHOT.jar". 2 I know the reason: I do not add " APACHE_2 %{IPORHOST} %{USER:hyphen} %{USER:user} [%{HTTPDATE1:timestamp}] %{QUOTEDSTRING:query} %{NUMBER:response} (?:%{NUMBER:bytes}|-)(?: %{QUOTEDSTRING:Referer} %{QUOTEDSTRING:agent})?"

3 If I parse a new type of log, I have to add a new pattern, otherwise errors may occurs. In my case, I need to parse types of logs, some types of them I do not know before my program runs. So I want to ask you wehether I can parse a log that only parts of this log are fit to some attom patterns? for example: log is "55.3.244.1 GET /index.html 15824 0.043". I use function g.discover(log) and get the pattern of this log : %{URIHOST} GET %{URIPATHPARAM} 15824 0.043 . If I use g.match(log), I will get errors: java.lang.IndexOutOfBoundsException: No group 11 at java.util.regex.Matcher.group(Matcher.java:470) at com.google.code.regexp.Matcher.namedGroups(Unknown Source) at com.nflabs.Grok.Match.captures(Match.java:75) at com.nflabs.Grok.LzbGrokeTest.testGrok(LzbGrokeTest.java:40) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

Can I get some fields of this log based on the gained pattern "%{URIHOST} GET %{URIPATHPARAM} 15824 0.043"??

anthonycorbacho commented 11 years ago

Okay.

for your point 3, let me fix it