matomo-org / matomo-log-analytics

Import any kind of server logs in Matomo for powerful log analytics. Universal log file parsing and reporting.
https://matomo.org/log-analytics/
GNU General Public License v3.0
225 stars 118 forks source link

Use custom log file format: example (to be added as unit test) #146

Open tfrdidi opened 8 years ago

tfrdidi commented 8 years ago

I have log files with the following format:

date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken

They are standard IIS-Logs, but I have to import them without certain fields. Therefore I used

--log-format-regex="(?P<date>.*?) \S+ \S+ \S+ \S+ \S+ (?P<path>/\S*) (?P<query_string>\S*) \S+ \S+ (?P<ip>[\w*.:-]*) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\d+) \S+ \S+ (?P<length>\S+) \S+ (?P<generation_time_milli>[.\d]+)"

But it did not find one line matching.

Here is one example log line 2015-01-01 21:51:58 W3SVC4 S01 9.9.9.9 GET /Content/index.aspx - 80 testuser 9.9.9.9 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+10.0;+Windows+NT+6.1;+WOW64;+Trident/6.0) - http://testsite.de/ testsite.de 200 0 0 30647 5673 2851

Any idea what i am doing wrong?

sgiehl commented 8 years ago

Try using more specific patterns. The .* used for your date pattern does match anything as it is not limited. maybe try something more specific like (?P<date>[0-9-]+ [0-9:]+)

tfrdidi commented 8 years ago

Thanks for your fast reply. I tried it with

--log-format-regex="(?P<date>\d+[-\d+]+ [\d+:]+) \S+ \S+ \S+ \S+ \S+ (?P<path>/\S*) (?P<query_string>\S*) \S+ \S+ (?P<ip>[\w*.:-]*) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\d+) \S+ \S+ (?P<length>\S+) \S+ \S+"

and several variations, but no change in the result. Do you have any idea how to get more information what is going under the hood? --debug is not very helpful.

sgiehl commented 8 years ago

You can try to use the regex to search in the log file on command line. If there are results it should work. If not you need to adjust the regex until it matches

tfrdidi commented 8 years ago

How could I search with this regex, which is specific to this python script in the command line?

tfrdidi commented 7 years ago

Thanks for the help! I have solved it using the following regex: --log-format-regex="(?P<date>\S+ \S+) \S+ \S+ \S+ \S+ (?P<path>\S+) (?P<query_string>\S*) \S+ \S+ (?P<ip>\S+) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\S+) \S+ \S+ (?P<length>\S+) \S+ (?P<generation_time_milli>[.\d]+)". This ticket could be closed now ;-)

mattab commented 7 years ago

Thanks for posting. We'll leave this ticket opened as it would be nice to:

  1. Add a unit test with your example log + command
  2. Maybe add a link on the doc to the test (or also repeat this example in the doc).

For sure it will help many people trying to write custom log imports

gpanagiotidis commented 5 years ago

Any updates on this? I would like to write my own custom log import but there is no proper documentation for this.