matomo-org / matomo-log-analytics

Import any kind of server logs in Matomo for powerful log analytics. Universal log file parsing and reporting.
https://matomo.org/log-analytics/
GNU General Public License v3.0
224 stars 118 forks source link

import_logs.py does not work with secure auth token in Matomo 5 #369

Open lkroll opened 7 months ago

lkroll commented 7 months ago

When creating a new auth token with the default and recommended "secure only" option, the import_logs.py script does not work correctly.

It appears to work just fine, but after a couple of seconds it will report "500 Internel Server Error", because the IP is being blocked by Matomo because of too many failed login attempts.

Whitelisting the IP in Matomo "fixes" the issue and the script finishes with no errors, but the logs are not imported.

Testing the same import with a new auth token, leaving the "secure only" option unchecked, results in a working import.

Here is the ouput with the "secure only" token (not working):

root@server:/var/www/***.de/private# /usr/bin/python3 /var/www/***.de/private/import_logs.py --url=https://stats.***.de --idsite=27 --recorders=4 --token-auth=*** `date --date=yesterday +/var/www/***.de/log/\%Y\%m\%d-access.log`
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/www/***.de/log/20240206-access.log...
2024-02-07 11:43:10,511: [INFO] The Matomo tracker identified 69 invalid requests.
2024-02-07 11:43:10,524: [INFO] The Matomo tracker identified 39 invalid requests.
2024-02-07 11:43:10,531: [INFO] The Matomo tracker identified 206 invalid requests.
2024-02-07 11:43:10,587: [INFO] The Matomo tracker identified 486 invalid requests.
2024-02-07 11:43:10,680: [INFO] The Matomo tracker identified 109 invalid requests.
2024-02-07 11:43:10,742: [INFO] The Matomo tracker identified 49 invalid requests.
2024-02-07 11:43:10,790: [INFO] The Matomo tracker identified 184 invalid requests.
2024-02-07 11:43:10,909: [INFO] The Matomo tracker identified 458 invalid requests.
2024-02-07 11:43:10,983: [INFO] The Matomo tracker identified 130 invalid requests.
2024-02-07 11:43:10,986: [INFO] The Matomo tracker identified 80 invalid requests.
2024-02-07 11:43:11,030: [INFO] The Matomo tracker identified 161 invalid requests.
17102 lines parsed, 1971 lines recorded, 1968 records/sec (avg), 1971 records/sec (current)
2024-02-07 11:43:11,180: [INFO] The Matomo tracker identified 74 invalid requests.
2024-02-07 11:43:11,267: [INFO] The Matomo tracker identified 99 invalid requests.
2024-02-07 11:43:11,287: [INFO] The Matomo tracker identified 136 invalid requests.
2024-02-07 11:43:11,431: [INFO] The Matomo tracker identified 120 invalid requests.
2024-02-07 11:43:11,432: [INFO] The Matomo tracker identified 429 invalid requests.
2024-02-07 11:43:11,474: [INFO] The Matomo tracker identified 73 invalid requests.
2024-02-07 11:43:11,486: [INFO] The Matomo tracker identified 95 invalid requests.
2024-02-07 11:43:11,598: [INFO] The Matomo tracker identified 9 invalid requests.
2024-02-07 11:43:11,598: [INFO] The Matomo tracker identified 14 invalid requests.
2024-02-07 11:43:11,616: [INFO] The Matomo tracker identified 30 invalid requests.
17102 lines parsed, 3050 lines recorded, 1523 records/sec (avg), 1079 records/sec (current)
2024-02-07 11:43:12,180: [INFO] The Matomo tracker identified 491 invalid requests.
2024-02-07 11:43:12,888: [INFO] The Matomo tracker identified 512 invalid requests.
17102 lines parsed, 4053 lines recorded, 1349 records/sec (avg), 1003 records/sec (current)
2024-02-07 11:43:13,140: [INFO] The Matomo tracker identified 83 invalid requests.

Logs import summary
-------------------

    4136 requests imported successfully
    1279 requests were downloads
    12966 requests ignored:
        1119 HTTP errors
        4533 HTTP redirects
        0 invalid log lines
        0 filtered log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        1140 requests done by bots, search engines...
        6174 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    4136 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:

Performance summary
-------------------

    Total time: 3 seconds
    Requests imported per second: 1217.12 requests per second

And without this option (working):

root@server:/var/www/***.de/private# /usr/bin/python3 /var/www/***.de/private/import_logs.py --url=https://stats.***.de --idsite=27 --recorders=4 --token-auth=*** `date --date=yesterday +/var/www/***.de/log/\%Y\%m\%d-access.log`
0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
Parsing log /var/www/***.de/log/20240206-access.log...
13486 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current)
13486 lines parsed, 560 lines recorded, 279 records/sec (avg), 560 records/sec (current)
16634 lines parsed, 861 lines recorded, 286 records/sec (avg), 301 records/sec (current)
17102 lines parsed, 1699 lines recorded, 424 records/sec (avg), 838 records/sec (current)
17102 lines parsed, 2251 lines recorded, 449 records/sec (avg), 552 records/sec (current)
17102 lines parsed, 2580 lines recorded, 429 records/sec (avg), 329 records/sec (current)
17102 lines parsed, 3099 lines recorded, 442 records/sec (avg), 519 records/sec (current)
17102 lines parsed, 3488 lines recorded, 435 records/sec (avg), 389 records/sec (current)
17102 lines parsed, 4136 lines recorded, 459 records/sec (avg), 648 records/sec (current)

Logs import summary
-------------------

    4136 requests imported successfully
    1279 requests were downloads
    12966 requests ignored:
        1119 HTTP errors
        4533 HTTP redirects
        0 invalid log lines
        0 filtered log lines
        0 requests did not match any known site
        0 requests did not match any --hostname
        1140 requests done by bots, search engines...
        6174 requests to static resources (css, js, images, ico, ttf...)
        0 requests to file downloads did not match any --download-extensions

Website import summary
----------------------

    4136 requests imported to 1 sites
        1 sites already existed
        0 sites were created:

    0 distinct hostnames did not match any existing site:

Performance summary
-------------------

    Total time: 9 seconds
    Requests imported per second: 449.82 requests per second

The script should return an error and should not display that requests were successfully imported.

sgiehl commented 7 months ago

Hi @lkroll. Thanks for letting us know. I guess the problem is, that the log importer sends the token auth within a GET request. If you are familiar with python and have some time to work on the required changes, feel free to contribute a pull request. I'm unable to promise if we will be able to work on this soon. In the meantime using a token that is not configured as secure only should still work as expected.

schube commented 7 months ago

@lkroll Thank you, that was exactly the problem I was facing and wasted hours with it. Creating a new token which is not secure solved the problem.

9joshua commented 6 months ago

Maybe we just need to add a note in the documentation to use only tokens created without the "Secure token" option?