Closed sandrocantagallo closed 1 year ago
Hi @sandrocantagallo The statistic displayed by the log importer is unrelated to device detector. The log importer already tries to filter away some common bots. See https://github.com/matomo-org/matomo-log-analytics/blob/5.x-dev/import_logs.py#L84-L114 Everything that is not detected there might later be detected by device detector in Matomo itself. Depending on the settings Matomo might then drop those tracking requests.
The log importer already tries to filter away some common bots
The problem is that in my case this feature didn't work
The problem was due to the regular expression used by the import script.
To fix the problem I had to modify the import script
_TEST_EXTENDED_LOG_FORMAT = (_COMMON_LOG_FORMAT +
r'\s+(?P<user_agent>.+)'
)
FORMATS = {
'common': RegexFormat('common', _COMMON_LOG_FORMAT),
'test': RegexFormat('test', _TEST_EXTENDED_LOG_FORMAT),
Then I force the use of this rule when I launch the import command
python3 import_logs.py --url=http://localhost/y-analytics access_log.txt --idsite=8 --log-format-name="test"
At this point the script knows how to read my LOG and recognizes the user agent.
79 requests imported successfully
10 requests were downloads
780 requests ignored:
3 HTTP errors
2 HTTP redirects
21 invalid log lines
0 filtered log lines
0 requests did not match any known site
0 requests did not match any --hostname
**_39 requests done by bots, search engines..._**
715 requests to static resources (css, js, images, ico, ttf...)
0 requests to file downloads did not match any --download-extensions
I also saw that there is a parameter to force the import with a specific regular expression:
--log-format-regex
but I couldn't get it to work. The problem is the documentation on the topic is too sparse in case of problems.
Crontab:
0 22 * * * python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --url=[http://19](http://0.0.0.19/)...1/matomo/ --idsite=2 /var/log/httpd/443-access_log > /home/* /***logs/matomo_import.log
Access Logs:
192.***.*** - - [22/Oct/2023:03:28:20 +0200] “GET /templates/jsn_solid_pro/js/jsn_link_profession_selected.js?ver=1697760000 HTTP/1.1” 200 3440 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.5993.70 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
But in summary logs
the problem is. 0 requests done by bots, search engines...
Do you have any suggestions?