matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.9k stars 2.65k forks source link

don't try to insert unrealistically high values for time_transfer and other performance metrics #17035

Closed Findus23 closed 3 years ago

Findus23 commented 3 years ago

see https://forum.matomo.org/t/512m-of-memory-limit-is-it-too-little/40047/5

No page takes 50 years to load, so I guess Matomo should not try to insert invalid values.

[28-Dec-2020 07:09:09 UTC] Error in Matomo (tracker): Error query: SQLSTATE[22003]: Numeric value out of range: 1264 Out of range value for column 'time_transfer' at row 1 In query: INSERT INTO matomo2_log_link_visit_action (idvisit, idsite, idvisitor, idaction_url, idaction_url_ref, idaction_name_ref, server_time, idpageview, pageview_position, time_spent_ref_action, time_dom_processing, time_network, time_server, time_transfer, idaction_name) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) Parameters: array ( 0 => '620729', 1 => 23, 2 => '&� ��V� ', 3 => '122626', 4 => '122626', 5 => '122625', 6 => '2020-12-28 07:09:09', 7 => 'qDEDCT', 8 => 3, 9 => 185, 10 => 874, 11 => 2180, 12 => 283, 13 => 1608938301263, 14 => '122625', )

Then again it would be interesting to know where they come from.

poetaster commented 3 years ago

I'm getting a similar error on field visitor_seconds_since_order. Error query: SQLSTATE[22003]: Numeric value out of range: 1264 Out of range value for column 'visitor_seconds_since_order' at row 1 In query: INSERT INTO piwik_log_visit (idvisitor, config_id, location_ip, idsite, profilable, visit_first_action_time, visit_goal_buyer, visit_goal_converted, visit_last_action_time, visitor_returning, visitor_seconds_since_first, visitor_seconds_since_order, visitor_count_visits, visit_entry_idaction_name, visit_entry_idaction_url, visit_exit_idaction_name, visit_exit_idaction_url, visit_total_actions, visit_total_interactions, visit_total_searches, referer_keyword, referer_name, referer_type, referer_url, location_browser_lang, config_browser_engine, config_browser_name, config_browser_version, config_client_type, config_device_brand, config_device_model, config_device_type, config_os, config_os_version, visit_total_events, visitor_localtime, visitor_seconds_since_last, config_resolution, config_cookie, config_flash, config_ja...'

That'd be more than 65 years? Since I'm importing log files I don't quiet understand where the value would come from? No campaigns as such. Hmmm.

Perhaps this should be a report at : https://github.com/matomo-org/matomo-log-analytics/ ?

tsteur commented 3 years ago

@poetaster I reckon it be best to report this indeed in the log analytics issue tracker as there might be a problem. Be great to mention there which parameters you use in log analytics import script call.

poetaster commented 3 years ago

Ahoi. I'll do so. Although it appears as haproxy log, it's a clf log file (postproced)

python3 /var/www/.../htdocs/misc/log-analytics/import_logs.py --auth-user=piwik --auth-password=.... --url=https://....org --idsite=1 --recorders=1 --log-hostname=https://.....org --log-format-name=ncsa_extended --log-hostname=https://.....org /path/piwik-logs/${DATE}.haproxy.log.done

tsteur commented 3 years ago

To try and reproduce this need to send an HTTP Matomo Tracking API request see https://developer.matomo.org/api-reference/tracking-api with unrealistic page performance info. Seems it is trying to use 1608938301263 for time transfer.

Tracking code that generates API request can also be seen in https://developer.matomo.org/guides/tracking-javascript-guide#finding-the-piwik-tracking-code but seems it's not possible to easily reproduce this particular issue with unrealistic numbers so we would just ignore any unrealistic value (any value > 16000000 which is 16 million)

basos9 commented 3 years ago

Hello, I encountered this after upgrading to matomo 4.2 and during the process of log import with replaying. But it may be that this is irrelevant of the import. By searching our logs for yesterday there are 0,0025% of visits that produce an error 400 and contain a pf_tfr parameter without a dot,

example request

&pf_net=0&pf_srv=122&pf_tfr=1621207625627&pf_dm1=2465 HTTP/1.1" 400 465 "" "Mozilla/5.0 (iPad; CPU OS 9_3_5 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13G36 Safari/601.1"

NOTES

This produces the error log in matomo.log

ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] Error query: SQLSTATE[22003]: Numeric value out of range: 1264 Out of range value for column 'time_transfer' at row 1 In query: INSERT INTO piwik_log_link_visit_action (idvisit, idsite, idvisitor, idaction_url, idaction_url_ref, idaction_name_ref, server_time, idpageview, pageview_position, time_spent_ref_action, time_dom_processing, time_network, time_server, time_transfer, idaction_name) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) Parameters: array (
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 0 => '188179972',
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 1 => 2,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 2 => 'xxx',
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 3 => '9721430',
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 4 => 0,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 5 => 0,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 6 => '2021-05-18 04:04:43',
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 7 => 'xxx',
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 8 => 1,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 9 => 0,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 10 => 1925,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 11 => 313,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 12 => 605,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 13 => 1621206923594,
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] 14 => '9721429',
ERROR Piwik\Tracker\Handler[2021-05-18 04:04:43 UTC] [0562e] )
basos9 commented 3 years ago

Finally i managed to import (replay) the tracker access logs by filtering out the outliers

grep -E 'pf_tfr=[0-9]{5,}(&| )'

Another convenience for the import script would be a switch to skip pass an erroneous batch rather than abort. Since one ourlier can make a big import to fail. Also during my tests the import tool does not always report the failed position in logs.

tsteur commented 3 years ago

@sgiehl is this one fixed already maybe?

sgiehl commented 3 years ago

Those values should be discarded with #17574. Not sure if there was another issue to identify why those high numbers occur in the first place.

Findus23 commented 3 years ago

I think we still don't know where those invalid values come from, Matomo just doesn't fail anymore with https://github.com/matomo-org/matomo/pull/17574 when it encounters them.

tsteur commented 3 years ago

@Findus23 that would be ideally a different issue ideally maybe? It seems to only happen with an old iOS version Mozilla/5.0 (iPad; CPU OS 9_3_5. I noticed this in our data too. I reckon not too many would use this old version?

tsteur commented 3 years ago

It be good for someone to have a quick look if we can reproduce this issue using iOS 9.X (seems browserstack has only a iPhone with that iOS version not a tablet). As there are very few devices running this OS we could otherwise ignore it. However, if we can reproduce it there we may want to add a check in tracker (matomo.js) to exclude way too high values.

sgiehl commented 3 years ago

@tsteur Tried to reproduce that with various ios emulations and simulations on browserstack, but wasn't able to. Some older versions of Safari do not support the performance timing api, but they simply don't track it. For all others the transmitted values looked correct.

tsteur commented 3 years ago

I'll close this for now. We can create a new issue if this still is an issue (maybe it was indirectly fixed meanwhile)

MatomoForumNotifications commented 10 months ago

This issue has been mentioned on Matomo forums. There might be relevant details there:

https://forum.matomo.org/t/error-on-archiving-track-sqlstate-22003/54806/1