rsyslog / liblognorm

a fast samples-based log normalization library
http://www.liblognorm.com
GNU Lesser General Public License v2.1
99 stars 64 forks source link

Question Apache log parser #306

Open greg-FR13 opened 6 years ago

greg-FR13 commented 6 years ago

Hi All,

I am a little bit lost I am using the following rule :

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%] "%verb:word% %request:word% HTTP/%httpversion:float%" %response:number% %bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%

Apache's Log 1 : ... XX.XX.XX.XX - "" [29/Jul/2018:06:15:47 +0000] "GET / HTTP/1.1" 200 4050 XX.XX.XX.XX - "" [29/Jul/2018:07:09:05 +0000] "GET /robots.txt HTTP/1.1" 404 985 XX.XX.XX.XX - "" [29/Jul/2018:08:20:39 +0000] "GET / HTTP/1.1" 200 4050

head -1 /var/log/httpd/my.access_log | /usr/bin/lognormalizer -r apache_access_log.rule -e json

{ "originalmsg": "XX.XXX.XXX.XXX - \"\" [29\/Jul\/2018:03:53:53 +0000] \"GET \/robots.txt HTTP\/1.1\" 404 985", "unparsed-data": "" }

The rule is working for other Apache's logs, my problem is present only when I have "" in the log.

How can I deal with %auth:word% and "" ?

Thank you for your help and support,

Regards,

manios commented 5 years ago

Hello @greg-FR13 ,

Your rule does not match the logs you are posting, since there is no user agent and referrer part present in the log messages.

For your logs:

192.168.1.1 - "Tester" [29/Jul/2018:05:15:47 +0000] "GET / HTTP/1.1" 200 4050
192.168.1.1 - "" [29/Jul/2018:06:15:47 +0000] "GET / HTTP/1.1" 200 4050
192.168.1.1 - "" [29/Jul/2018:07:09:05 +0000] "GET /robots.txt HTTP/1.1" 404 985
192.168.1.1 - "" [29/Jul/2018:08:20:39 +0000] "GET / HTTP/1.1" 200 4050

this rule matches :

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to{"extradata":"]"}%] "%verb:word% %request:word% HTTP/%httpversion:float{"format":"number"}%" %response:number{"format":"number"}% %blob:rest%

and when you run:

lognormalizer  -H -p -r apache.rule  < apache.log

it produces the following results:

{ "blob": "4050", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:05:15:47 +0000", "auth": "\"Tester\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "4050", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:06:15:47 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "985", "response": 404, "httpversion": 1.1, "request": "\/robots.txt", "verb": "GET", "timestamp": "29\/Jul\/2018:07:09:05 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }
{ "blob": "405", "response": 200, "httpversion": 1.1, "request": "\/", "verb": "GET", "timestamp": "29\/Jul\/2018:08:20:39 +0000", "auth": "\"\"", "ident": "-", "clientip": "192.168.1.1" }

In order to include user agent and referrer parts then you have 2 options:

  1. Either provide another rule with a higher priority than the aforementioned in the %response rule field.
  2. Enhance the existing rule with an alternative parser.

Keep in mind that liblognorm rules are not regular expressions. They produce Directed Acyclic Graphs (DAG) and the rules are handled in a different way than you may think by the parser . For more information please refer to official documentation.

Best regards,
Christos

greg-FR13 commented 5 years ago

Hi @manios , Thank you for your complete answer; I will having a look.

Best,