Open tfrdidi opened 8 years ago
Try using more specific patterns. The .*
used for your date pattern does match anything as it is not limited. maybe try something more specific like (?P<date>[0-9-]+ [0-9:]+)
Thanks for your fast reply. I tried it with
--log-format-regex="(?P<date>\d+[-\d+]+ [\d+:]+) \S+ \S+ \S+ \S+ \S+ (?P<path>/\S*) (?P<query_string>\S*) \S+ \S+ (?P<ip>[\w*.:-]*) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\d+) \S+ \S+ (?P<length>\S+) \S+ \S+"
and several variations, but no change in the result. Do you have any idea how to get more information what is going under the hood? --debug
is not very helpful.
You can try to use the regex to search in the log file on command line. If there are results it should work. If not you need to adjust the regex until it matches
How could I search with this regex, which is specific to this python script in the command line?
Thanks for the help! I have solved it using the following regex:
--log-format-regex="(?P<date>\S+ \S+) \S+ \S+ \S+ \S+ (?P<path>\S+) (?P<query_string>\S*) \S+ \S+ (?P<ip>\S+) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\S+) \S+ \S+ (?P<length>\S+) \S+ (?P<generation_time_milli>[.\d]+)"
.
This ticket could be closed now ;-)
Thanks for posting. We'll leave this ticket opened as it would be nice to:
For sure it will help many people trying to write custom log imports
Any updates on this? I would like to write my own custom log import but there is no proper documentation for this.
I have log files with the following format:
date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
They are standard IIS-Logs, but I have to import them without certain fields. Therefore I used
--log-format-regex="(?P<date>.*?) \S+ \S+ \S+ \S+ \S+ (?P<path>/\S*) (?P<query_string>\S*) \S+ \S+ (?P<ip>[\w*.:-]*) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\d+) \S+ \S+ (?P<length>\S+) \S+ (?P<generation_time_milli>[.\d]+)"
But it did not find one line matching.
Here is one example log line
2015-01-01 21:51:58 W3SVC4 S01 9.9.9.9 GET /Content/index.aspx - 80 testuser 9.9.9.9 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+10.0;+Windows+NT+6.1;+WOW64;+Trident/6.0) - http://testsite.de/ testsite.de 200 0 0 30647 5673 2851
Any idea what i am doing wrong?