matomo-org / matomo-log-analytics

Import any kind of server logs in Matomo for powerful log analytics. Universal log file parsing and reporting.
https://matomo.org/log-analytics/
GNU General Public License v3.0
224 stars 118 forks source link

import_logs.py: Silent failure if log file cannot be accessed #360

Open jlebonzec opened 10 months ago

jlebonzec commented 10 months ago

Hello,

If the apache log file cannot be accessed, the import_logs.py script will act as if the file could be read but was empty.

Our matomo files are owned by www-data. We saved our apache log files into /home/ubuntu/apache_access_replay_one.log

We tried running the script this way:

sudo -u www-data python3.8 /var/www/html/misc/log-analytics/import_logs.py --url="https://<our_site>" --dry-run --show-progress --debug --replay-tracking /home/ubuntu/apache_access_replay_one.log

It gave us this output:

2023-09-07 08:04:32,834: [DEBUG] Accepted hostnames: all
2023-09-07 08:04:32,834: [DEBUG] Matomo Tracker API URL is: https://<our_site>
2023-09-07 08:04:32,834: [DEBUG] Matomo Analytics API URL is: https://<our_site>
2023-09-07 08:04:32,834: [DEBUG] No token-auth specified
2023-09-07 08:04:32,834: [DEBUG] No credentials specified, reading them from "/var/www/html/config/config.ini.php"
2023-09-07 08:04:32,889: [DEBUG] Authentication token token_auth is: REDACTED
2023-09-07 08:04:32,889: [DEBUG] Resolver: dynamic
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
2023-09-07 08:04:32,923: [DEBUG] Launched recorder

Logs import summary
-------------------

    0 requests imported successfully
    0 requests were downloads
    0 requests ignored:
        0 HTTP errors
        0 HTTP redirects
        0 invalid log lines
        0 filtered log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        0 requests done by bots, search engines...
        0 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    0 requests imported to 0 sites
        0 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:

Performance summary
-------------------

    Total time: 0 seconds
    Requests imported per second: 0.0 requests per second

Processing your log data
------------------------

    In order for your logs to be processed by Matomo, you may need to run the following command:
     ./console core:archive --force-all-websites --url='https://<our_site>'

As you can read, it's not clear why nothing will be imported. I believe a simple detection and error message would be helpful for many.

When moving the file to a place that www-data can access, it worked fine.

In addition, I would like to highlight a possible error in the documentation. For the script to work, we had to use the full URL with https, not just the domain. Documentation reads: --url=piwik.example.net.

Kind regards,