matomo-org / docker

Official Docker project for Matomo Analytics
https://matomo.org
Other
829 stars 345 forks source link

log import5 script results in all client-ip's to be 0.0.0.0 #319

Open hanscees opened 1 year ago

hanscees commented 1 year ago

Hi, I am using the docker version of matomo obviously. I am loading data into it using the log script like so:

for i in `ls`; do
#echo $i
if  [[ $(stat -c "%A" $i) =~ "w" ]]; then
  echo $i
/usr/bin/python3 /var/lib/docker/volumes/matmoto_matomo/_data/misc/log-analytics/import_logs.py --url=http://192.168.0.61:8080 --debug  --login hanscees@hanscees.con --password "secletvelly" --idsite=1 --recorders=4   $i

  echo "just uploaded this file to webstats: "
  echo $i
fi

This all works file and the dashboard shows all kinds of data. However, no data where visitors come from in the dashboard.

The default docker matomo image as far as I can see does have geoip2 on board. I have even updated it with my maxmind license. I have done things like:

docker exec -it matmoto-app-1 php ./console usercountry:attribute 2023-05-13,2023-05-14

and 
docker exec -it matmoto-app-1 php ./console core:invalidate-report-data --dates=2023-04-01,2023-05-14 --sites=1

this gives no errors. But no geoip data in the dashboard.

The system settings say: `

Geolocation geoip2php (continent_code, continent_name, country_code, country_name, region_code, region_name, city_name, postal_code, lat, long)

`

Any help would be greatly appreciated.

hanscees commented 1 year ago

so after a few hours of digging it turns out the ../import_logs.py script does not understand my logfiles. So all client-ip's are recorded as 0.0.0.0

this page shows how to write your own regexps script: https://github.com/matomo-org/matomo-log-analytics/#readme

It would be an improvement if --debug would actually show you some client-ip's.

So now I have to figure out how to empty the whole database because all records are wrong.

My loglines are like this: 51.159.154.15 www.bomengids.nl - - [22/May/2023:00:00:18 +0200] "GET /winter/Hollandse_iep__Ulmus_hollandica__Dutch_Elm@1@img_91 98knop_th.jpg HTTP/1.1" 200 9067 "https://www.bomengids.nl/knop.html" "newspaper/0.2.8"

the regexps is --log-format-regex='((?P<ip>\S+) (?P<host>\S+) \S+ \S+ \[(?P<date>.*?) (?P<timezone>.*?)\] "GET (?P<path>.*?) HTTP/\S+" (?P<status>\S+) (?P<length>\S+) "(?P<referrer>.*?)" "(?P<user_agent>.*?)").*'

I figured this out because the visits log show wrong url's where it thinks client-ip's are part fo my website 16[78.46.70.104/2022/species/Zomereik__Quercus_robur__English_oak__Sommer-Eiche--Stiel-Eiche__Chene_commun--Chene_dAngleterre.html](https://78.46.70.104/2022/species/Zomereik__Quercus_robur__English_oak__Sommer-Eiche--Stiel-Eiche__Chene_commun--Chene_dAngleterre.html)

hanscees commented 1 year ago

if the import script in debug more could be fed some loglines where it would log what ity " thinks" is the client-ip and host and url, that would make it better understandable. It might even suggest to read the url documentation page above.

Spent a lot of time debugging a very long confusing script. A well I did learn python has got named regexps groups, which is very handy.

A script much more readable is here: https://github.com/gilbN/geoip2influx

cheers

hanscees commented 1 year ago

and also with the new regexps it does not work. All visits are from 0.0.0.0 according the the visits log.

what a drag.

hanscees commented 1 year ago

see this analysis: https://github.com/matomo-org/matomo-log-analytics/issues/354

using the log import script with the docker containers results in all client-ip's to be 0.0.0.0. If my analysis is correct this causes the geoip map to be empty of course.

Please help to find the bug, or where I do things wrong.