matomo-org / matomo

Empowering People Ethically with the leading open source alternative to Google Analytics that gives you full control over your data. Matomo lets you easily collect data from websites & apps and visualise this data and extract insights. Privacy is built-in. Liberating Web Analytics. Star us on Github? +1. And we love Pull Requests!
https://matomo.org/
GNU General Public License v3.0
19.92k stars 2.66k forks source link

Some SQL queries generated by Live plugin API are heavy SQL queries #9499

Open mattab opened 8 years ago

mattab commented 8 years ago

We have experienced a case where Live SQL queries can be slow. For example the queries run for two hours or more. Here is an example of such query running slow, and the explain extended:


SELECT sub.* FROM ( SELECT log_inner.* FROM ( SELECT log_visit.* FROM piwik_log_visit AS log_visit LEFT JOIN piwik_log_link_visit_action AS log_link_visit_action ON log_link_visit_action.idvisit = log_visit.idvisit WHERE ( log_visit.idsite in ('1') AND log_visit.visit_last_action_time >= '2015-10-20 07:00:00' AND  log_visit.visit_last_action_time <= '2015-12-20 08:00:00' ) AND ( ( log_link_visit_action.idaction_url IS NOT NULL AND (log_link_visit_action.idaction_url <> '' OR log_link_visit_action.idaction_url = 0) ) ) ORDER BY idsite, visit_last_action_time DESC LIMIT 400) AS log_inner ORDER BY idsite, visit_last_action_time DESC LIMIT 400) AS sub GROUP BY sub.idvisit ORDER BY sub.visit_last_action_time DESC;

query explain extended

+------+-------------+-----------------------+------+-----------------------------------------------------------------------------------+------------------------------+---------+-----------------------------+----------+----------+----------------------------------------------------+

| id   | select_type | table                 | type | possible_keys                                                                     | key                          | key_len | ref                         | rows     | filtered | Extra                                              |

+------+-------------+-----------------------+------+-----------------------------------------------------------------------------------+------------------------------+---------+-----------------------------+----------+----------+----------------------------------------------------+

|    1 | PRIMARY     | <derived2>            | ALL  | NULL                                                                              | NULL                         | NULL    | NULL                        |      400 |   100.00 | Using temporary; Using filesort                    |

|    2 | DERIVED     | <derived3>            | ALL  | NULL                                                                              | NULL                         | NULL    | NULL                        |      400 |   100.00 | Using filesort                                     |

|    3 | DERIVED     | log_visit             | ref  | PRIMARY,index_idsite_config_datetime,index_idsite_datetime,index_idsite_idvisitor | index_idsite_config_datetime | 4       | const                       | 17030952 |   100.00 | Using index condition; Using where; Using filesort |

|    3 | DERIVED     | log_link_visit_action | ref  | index_idvisit,transitions_url                                                     | index_idvisit                | 4       | piwik3293.log_visit.idvisit |        6 |   100.00 | Using where                                        |

+------+-------------+-----------------------+------+-----------------------------------------------------------------------------------+------------------------------+---------+-----------------------------+----------+----------+----------------------------------------------------+

This occurs especially when date ranges include many days such as:

( log_visit.idsite in ('1') 

                AND log_visit.visit_last_action_time >= '2015-10-20 07:00:00'

                AND  log_visit.visit_last_action_time <= '2015-12-20 08:00:00' )

For now we cannot easily improve this so we simply document, via this issue, that there may be performance issues when Live plugin (eg. Visitor log) is used with very wide date ranges.

tfrdidi commented 8 years ago

What about using pagination for the UI in combination with limited SQL querries for much smaller date ranges? I encountered often problems when accidently clicking on visitor-log while having selected month or week. Then I get a timeout after 60 seconds, while piwik does not react on other user input.