matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.91k stars 2.65k forks source link

[Bug] Tons of FORCE INDEX (index_idsite_idvisitor_time) SQL queries #21908

Open L3on1d opened 9 months ago

L3on1d commented 9 months ago

What happened?

Hi!

Occasionally Matomo DB fails due to tons of similar SQL queries, an example:

State: Statistics Info: SELECT visit_last_action_time, visit_first_action_time, idvisitor, idvisit, user_id, visit_exit_idaction_url, visit_exit_idaction_name, visitor_returning, visitor_seconds_since_first, visitor_seconds_since_order, visitor_count_visits, visit_goal_buyer, location_country, location_region, location_city, location_latitude, location_longitude, referer_name, referer_keyword, referer_type, idsite, profilable, visit_entry_idaction_url, visit_total_actions, visit_total_interactions, visit_total_searches, referer_url, config_browser_name, config_client_type, config_device_brand, config_device_model, config_device_type, visit_total_events, visit_total_time, location_ip, location_browser_lang, last_idlink_va, custom_dimension_1, custom_dimension_2, custom_dimension_3, custom_dimension_4, custom_dimension_5, custom_var_k1, custom_var_v1, custom_var_k2, custom_var_v2, custom_var_k3, custom_var_v3, custom_var_k4, custom_var_v4, custom_var _k5, custom_var_v5 FROM mtm_log_visit FORCE INDEX (index_idsite_idvisitor_time) WHERE idsite = '1' AND visit_last_action_time <= '2024-02-09 16:21:53' AND i dvisitor = 'ߐ??T???' ORDER BY visit_last_action_time DESC LIMIT 1 Progress: 0.000

I noticed that before these similar queries filling DB connection limit this type of query is always executed:

State: Updating Info: UPDATE mtm_log_visit SET profilable = '1', visit_last_action_time = '2024-02-09 16:21:52', visitor_seconds_since_order = NULL, visit_total_time = '2509', custom_dimension_1 = 'profile', custom_dimension_2 = 'en', custom_dimension_3 = 'legacy' WHERE idsite = '1' AND idvisit = '8064126654' Progress: 0.000

Matomo version: 5.0.1 DB: MariaDB 10.11.2 PHP: 8.2.5

Any suggestions?

Thank you!

What should happen?

Matomo DB fails due to tons of similar SQL queries.

How can this be reproduced?

Randomly.

Matomo version

5.0.1

PHP version

8.2.5

Server operating system

Ubuntu 20.04.6 LTS

What browsers are you seeing the problem on?

No response

Computer operating system

No response

Relevant log output

No response

Validations

michalkleiner commented 9 months ago

Hi @L3on1d. Sorry to hear you have random issues with you Matomo instance. When you say Matomo db fails, what exactly do you mean by that? Does the connection die, the query die and then it gets reconnected or does it completely kill the db server in an unrecoverable way or maybe even something else? It's hard to see what the issue might be, one thing that looks odd is the visitor ID at the end of the query (unless you intentionally scrambled it). Could you perhaps try to increase your number of db connections if it's exhausting them?

L3on1d commented 9 months ago

Hi @michalkleiner

I mean these similar sql queries quickly hits connection limit, no matter which limit is set (e.g 10000).

Regarding odd visitor id, yes looks strange to me too, is there any related configuration option on Matomo side which controls visitor id format?

Thank you.

michalkleiner commented 9 months ago

@matomo-org/core-reviewers any ideas here?

bx80 commented 9 months ago

idvisit is a binary column so it can look a bit weird when shown in the logs, that's probably nothing to worry about.

When this happens is there any corresponding spike in site traffic?

L3on1d commented 9 months ago

No traffic spikes during this issue. Always random time and days.

bx80 commented 8 months ago

It looks like that query is being generated in the Tracker/Model::updateVisit() method. Other than handling a new tracking request, it appears that this method is only called in two other places:

Since this isn't occurring during a traffic spike of new requests then I'm guessing the anonymize visit information scheduled task is causing the queries.

The scheduled task doesn't include any throttling option and will find all visit records that need to be anonymized, then it will issue sequential update queries for each visit. Normally these queries would be synchonous, so even if there was a large number of rows to process they would all be processed within a single connection one at a time. So possibly something is causing each query to start a new database connection or maybe the scheduled task is being trigger many times.

Some things to check:

L3on1d commented 8 months ago

Hi! @bx80

* Does disabling the  "Settings" - "Anonymize Tracking Data" options prevent this issue from occurring?

I just disabled this option, we had "Anonymize Visitors IP addresses".

  • Is there anything unusual about the PHP or Matomo database configuration options? Checked, we don't have any specific PHP or DB related settings.
  • How often is the Matomo cronjob being called? What --concurrent-archivers setting is being used? Once per hour with default settings.
  • Does the issue occur if you manually run the scheduled tasks with ./console core:run-scheduled-tasks --force ? Let me try. I will update you about results.

Thank you!

L3on1d commented 7 months ago

@bx80 an update, after disabling "Anonymize Tracking Data" this problem occurred again after some time.

If I run this command "./console core:run-scheduled-tasks --force" manually issue didn't occur.

Forgot to say, apart of js tracking we use log import feature, If I use more aggressive import settings e.g with more recorders and bigger "recorder-max-payload-size" this problem occurs more often.

Thank you!