matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.81k stars 2.64k forks source link

Problem importing w3c logs #11824

Closed magnus-84 closed 7 years ago

magnus-84 commented 7 years ago

Hello

I have problems trying to import W3C logs from Incapsula services in to piwik. Below is the line i use to try to import the logfile. IP and domain info have been changed for protection.

/usr/bin/python /var/www/html/piwik/misc/log-analytics/import_logs.py --url=http://10.1.2.3 --idsite=8 --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --log-format-name=w3c_extended --w3c-fields='#Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName' /root/web.log --debug --debug

Debug output below

2017-06-28 11:21:17,172: [DEBUG] Accepted hostnames: all 2017-06-28 11:21:17,172: [DEBUG] Piwik Tracker API URL is: http://10.1.2.3 2017-06-28 11:21:17,172: [DEBUG] Piwik Analytics API URL is: http://10.1.2.3 2017-06-28 11:21:17,172: [DEBUG] No token-auth specified 2017-06-28 11:21:17,172: [DEBUG] No credentials specified, reading them from "/var/www/html/piwik/config/config.ini.php" 2017-06-28 11:21:17,240: [DEBUG] Authentication token token_auth is: 90871c8584ddf2265f54553a305b6ae1 2017-06-28 11:21:17,240: [DEBUG] Resolver: static 0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current) 2017-06-28 11:21:17,343: [DEBUG] Launched recorder 2017-06-28 11:21:17,343: [DEBUG] Launched recorder 2017-06-28 11:21:17,344: [DEBUG] Launched recorder 2017-06-28 11:21:17,344: [DEBUG] Launched recorder Parsing log /root/web.log... 2017-06-28 11:21:17,345: [DEBUG] Based on 'Fields:' line, computed regex to be (?P\d+[-\d+]+\s+[\d+:]+)[.\d]?\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+"?(?P[\w.:-])"?\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?P".?"|\S)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?P\S)\s+(?P\d+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".?"|\S+)\s+(?:".*?"|\S+) 2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Software: Incapsula LOGS API

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Version: 1.1

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Date: 28/Jun/2017 07:28:59

2017-06-28 11:21:17,350: [DEBUG] Invalid line detected (line did not match): #Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName

2017-06-28 11:21:17,351: [DEBUG] Invalid line detected (line did not match): "2017-06-28" "07:26:35" "a1f36498-c34a-45b9-b3a5-ee0bd00f91b6" "Chrome" "Browser" "false" "true" "123.123.123.123" "" "62a660e57ba257275cf7ccf699919eae18e07e84cb11c1075e99b1be98456059d3064ec14d3932ba6e89f5393a158b8b8c2572ad7ad7dadb0fe02a34ae4c3d504c035017bf9a6a7802bb898226378938" "NA" "774502" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "452000660051880893" "44850949" "SE" "LS" "Stockholm" "www.example.com" "32.0000" "32.0000" "Customer" "www.example.com/artiklar/x/y/z/" "" "HTTP" "REQ_PASSED" "118866685985031205" "" "124.124.124.124" "80" "GET" "" "200" "123.123.123.123" "10117" "1498634795555" "" "" "" "" ""

Logs import summary

0 requests imported successfully
0 requests were downloads
5 requests ignored:
    0 HTTP errors
    0 HTTP redirects
    5 invalid log lines
    0 requests did not match any known site
    0 requests did not match any --hostname
    0 requests done by bots, search engines...
    0 requests to static resources (css, js, images, ico, ttf...)
    0 requests to file downloads did not match any --download-extensions

Website import summary

0 requests imported to 1 sites
    1 sites already existed
    0 sites were created:

0 distinct hostnames did not match any existing site:

Performance summary

Total time: 0 seconds
Requests imported per second: 0.0 requests per second

Original logfile example below.

Software: Incapsula LOGS API

Version: 1.1

Date: 28/Jun/2017 07:28:59

Fields: date time cs-vid cs-clapp cs-browsertype cs-js-support cs-co-support c-ip s-caip cs-clappsig s-capsupport s-suid cs(User-Agent) cs-sessionid s-siteid cs-countrycode s-tag cs-cicode s-computername cs-lat cs-long s-accountname cs-uri cs-postbody cs-version sc-action s-externalid cs(Referrer) s-ip s-port cs-method cs-uri-query sc-status s-xff cs-bytes cs-start cs-rule cs-severity cs-attacktype cs-attackid s-ruleName

"2017-06-28" "07:26:35" "a1f36498-c34a-45b9-b3a5-ee0bd00f91b6" "Chrome" "Browser" "false" "true" "123.123.123.123" "" "62a660e57ba257275cf7ccf699919eae18e07e84cb11c1075e99b1be98456059d3064ec14d3932ba6e89f5393a158b8b8c2572ad7ad7dadb0fe02a34ae4c3d504c035017bf9a6a7802bb898226378938" "NA" "774502" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" "452000660051880893" "44850949" "SE" "LS" "Stockholm" "www.example.com" "32.0000" "32.0000" "Customer" "www.example.com/artiklar/x/y/z/" "" "HTTP" "REQ_PASSED" "118866685985031205" "" "124.124.124.124" "80" "GET" "" "200" "123.123.123.123" "10117" "1498634795555" "" "" "" "" ""

I gues the problem is somthing in the regex? Any help would be appriciated. I have no knowledge of regex myself.

Regards Magnus

sgiehl commented 7 years ago

@magnus-84: I've recreated the issue in the log importer repo: https://github.com/piwik/piwik-log-analytics/issues/179