Open mzhang77 opened 3 weeks ago
Only return data for the same date
MySQL root@127.0.0.1:test> select time,query from information_schema.cluster_slow_query where time > '2024-09-06 00:00:01.000000' limit 1;
+----------------------------+-------------------------------------+
| time | query |
+----------------------------+-------------------------------------+
| 2024-09-06 14:54:33.064918 | CREATE DATABASE IF NOT EXISTS test; |
+----------------------------+-------------------------------------+
1 row in set
Time: 0.102s
MySQL root@127.0.0.1:test> select time,query from information_schema.cluster_slow_query where time > '2024-09-15 00:00:01.000000' limit 1;
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| time | query |
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-09-15 00:00:12.485027 | select job_meta, processing from mysql.tidb_ddl_job where job_id in (select min(job_id) from mysql.tidb_ddl_job group by schema_ids, table_ids, processing) and not reorg order by processing desc, job_id; |
+----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
Time: 0.017s
MySQL root@127.0.0.1:test> select time,query from information_schema.cluster_slow_query where time > '2024-09-16 00:00:01.000000' limit 1;
+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| time | query |
+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2024-09-16 00:04:56.937909 | INSERT HIGH_PRIORITY INTO mysql.tidb VALUES ('tikv_gc_leader_lease', '20240916-00:06:51.931 +0800', 'Current GC worker leader lease. (DO NOT EDIT)') ON DUPLICATE KEY UPDATE variable_value = '20240916-00:06:51.931 +0800', comment = 'Current GC worker leader lease. (DO NOT EDIT)'; |
+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
/assign
/assign
The root cause of this issue may lie in the unreasonable implementation of https://github.com/pingcap/tidb/pull/14840. At that time, I was worried that unreasonable queries would cause a lot of slow log file parsing. which would consume a lot of TIDB CPU resources.
Specifically, when querying the slow_query
or cluster_slow_query
system table:
If no query conditions are specified, only the currently active slow log file will be queried. Note that the slow log file is automatically rotated, with a default size of 300MB.
If only start_time is specified, i.e., the condition time > 'x'
, only the slow logs within the range of ('x', 'x' + 24h) will be queried.
If only end_time is specified, i.e., the condition time < 'x'
, only the slow logs within the range of ('x' - 24h, 'x') will be queried.
@songrijie PTAL, do you think the above (current) behavior is reasonable? And if not, What is reasonable behavior?
@crazycs520 I guess it should be up to the caller, what range they want to select. Even with the current limitation they could set any range they want and cause lots of parsing (if they specify both lower and upper bounds), so current logic doesn't really prevent this situation, but makes the behavior very confusing (also it's not documented anywhere as far as I understand).
tidb query is limited by max_execution_time and tidb_mem_quota_query. Query on information_schema.cluster_slow_query should follow same rule, so it's easy for user to understand and manage.
I prefer to force the user to specify the time range instead of any implicit conditions behavior. That is, if a user does not specify a start or end time, an error should returned.
The option with explicit error (if that behavior is documented) could be better, but what about cases when a user specified both start and end times, but the range is still too broad to handle efficiently? It's not different from not specifying start or end time by end result.
I think a better option would be to rely on quotas / time limits as @mzhang77 pointed out above, without any SQL-level restrictions.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
The second query should return result if the first query returns result
3. What did you see instead (Required)
4. What is your TiDB version? (Required)