matomo-org / matomo-log-analytics

Import any kind of server logs in Matomo for powerful log analytics. Universal log file parsing and reporting.
https://matomo.org/log-analytics/
GNU General Public License v3.0
224 stars 118 forks source link

Import fails when Referrer field is not present using Regex format #335

Closed kuzi-moto closed 2 years ago

kuzi-moto commented 2 years ago

Hello, I am running into an issue where the import fails when trying to process a log entry that is missing the referrer field. The server outputs the logs in JSON, and if there is no referrer just omits the referrer field from the output. I have worked around this by designing my regex to account for this field possibly being missing.

However, it appears that the script will still try to process the field but then exits since it's missing. Here's the command run and the output, notice I'm running an image built from the python image and have mounted the Matomo data and the server logs within the container.

sudo docker exec matomo-import-logs python /tmp/matomo/misc/log-analytics/import_logs.py --url=https://matomo.example.com --token-auth=<token> --add-sites-new-hosts --enable-bots --enable-http-errors --log-format-regex="{\"ClientHost\":\"(?P<ip>\d+\.\d+\.\d+.\d+)\",\"\w+\":\"(?P<userid>.+?)\",\"\w+\":(?P<length>\d+),\"\w+\":(?P<status>\d+),\"\w+\":\d+,\"\w+\":\"(?P<host>.+?)\",\"\w+\":\"(?P<method>\w+)\",\"\w+\":\"(?P<path>.+?)\",\"StartLocal\":\"(?P<date>\d+-\d+-\d+T\d+:\d+:\d+)\.\d+(?P<timezone>-\d+:\d+)\",\"\w+\":\"\w+\",\"\w+\":\"[^\"]*\",(?:\"request_Referer\":\"(?P<referrer>[^\"]+)?\",)?\"\w+-\w+\":\"(?P<user_agent>[^\"]+)\",\"\w+\":\"[^\"]+\"}"
 --log-date-format="%Y-%m-%dT%H:%M:%S" -dd /var/log/traefik/access.log
[sudo] password for user:
2022-05-15 04:01:43,814: [DEBUG] Accepted hostnames: all
2022-05-15 04:01:43,816: [DEBUG] Matomo Tracker API URL is: https://matomo.example.com
2022-05-15 04:01:43,816: [DEBUG] Matomo Analytics API URL is: https://matomo.example.com
2022-05-15 04:01:43,816: [DEBUG] Authentication token token_auth is: <token>
2022-05-15 04:01:43,816: [DEBUG] Resolver: dynamic
2022-05-15 04:01:43,817: [DEBUG] Launched recorder
Traceback (most recent call last):
  File "/tmp/matomo/misc/log-analytics/import_logs.py", line 2688, in <module>
    main()
  File "/tmp/matomo/misc/log-analytics/import_logs.py", line 2654, in main
    parser.parse(filename)
  File "/tmp/matomo/misc/log-analytics/import_logs.py", line 2487, in parse
    if hit.referrer.startswith('"'):
AttributeError: 'NoneType' object has no attribute 'startswith'

Seems like this could be resolved easily enough by adding a test to skip referrer if it's not present. I might be able to submit a pull request in the next few days if I figure out enough Python to write it myself but maybe someone smarter will be able to it before then. Thanks for any assistance!

sgiehl commented 2 years ago

Guess this should be fixed with https://github.com/matomo-org/matomo-log-analytics/pull/336 then.