We recently upgraded to 4.0 and then 4.1.0, we upgraded to Python 3.9.1 to be able to run import_logs.py as well.
It seems some of our server logs (recently by the looks of it) have a unicode character in it with prevents the nightly import job we run to work correctly - see error below.
Problem seems to be in parse.py and as far as we can tell the problematic character … (dot-dot-dot) - if we remove/replace this character from the log files it works
Is there a setting/option we an invoke in Matomo to "bypass" these errors or should we resort to stripping the character from the logs before start the nightly job.
Just to note we didn't have this issue with Matomo 3+Python 2.
Error:
Exception in thread Thread-2:
Traceback (most recent call last):
File "c:\Python391\lib\threading.py", line 954, in _bootstrap_inner
self.run()
File "c:\Python391\lib\threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "e:\www\matomo.domain\misc\log-analytics\import_logs.py", line 1849, in _run_bulk
self._record_hits(hits)
File "e:\www\matomo.domain\misc\log-analytics\import_logs.py", line 1995, in _record_hits
'requests': [self._get_hit_args(hit) for hit in hits]
File "e:\www\matomo.domain\misc\log-analytics\import_logs.py", line 1995, in <listcomp>
'requests': [self._get_hit_args(hit) for hit in hits]
File "e:\www\matomo.domain\misc\log-analytics\import_logs.py", line 1953, in _get_hit_args
urllib.parse.quote(args['url'], ''),
File "c:\Python391\lib\urllib\parse.py", line 847, in quote
string = string.encode(encoding, errors)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc85' in position 58: surrogates not allowed
39199 lines parsed, 10200 lines recorded, 139 records/sec (avg), 200 records/sec (current)
39199 lines parsed, 10200 lines recorded, 137 records/sec (avg), 0 records/sec (current)
39199 lines parsed, 10200 lines recorded, 135 records/sec (avg), 0 records/sec (current)
We recently upgraded to 4.0 and then 4.1.0, we upgraded to Python 3.9.1 to be able to run import_logs.py as well.
It seems some of our server logs (recently by the looks of it) have a unicode character in it with prevents the nightly import job we run to work correctly - see error below.
Problem seems to be in
parse.py
and as far as we can tell the problematic character…
(dot-dot-dot) - if we remove/replace this character from the log files it worksIs there a setting/option we an invoke in Matomo to "bypass" these errors or should we resort to stripping the character from the logs before start the nightly job.
Just to note we didn't have this issue with Matomo 3+Python 2.
Error:
Related perhaps:
https://github.com/matomo-org/matomo-log-analytics/issues/278 https://github.com/matomo-org/matomo/pull/15618
parse.py - https://github.com/python/cpython/blob/3.9/Lib/urllib/parse.py#L847
Edit: we use w3c format server logs