shatteredsilicon / ssm-submodules

GNU Affero General Public License v3.0
0 stars 2 forks source link

Improve Query Time Resolution in QAN #245

Open gordan-bobic opened 2 months ago

gordan-bobic commented 2 months ago

Currently, the queries get logged as having executed at the time when they were harvested from the slow log, not when they executed. In most, probably all cases, slow log contains a SET TIMESTAMP statement with the timestamp of when the query was executed.

CREATE TABLE `ssm`.`query_class_metrics` (
  `query_class_id` int(10) unsigned NOT NULL,
  `instance_id` int(10) unsigned NOT NULL,
  `start_ts` timestamp NOT NULL DEFAULT current_timestamp(),
  ...
  PRIMARY KEY (`query_class_id`,`instance_id`,`start_ts`),
  KEY `start_ts` (`start_ts`),
  KEY `instance_start` (`instance_id`,`start_ts`)
);

CREATE TABLE `ssm`.`query_global_metrics` (
  `instance_id` int(10) unsigned NOT NULL,
  `start_ts` timestamp NOT NULL DEFAULT current_timestamp(),
  ...
  PRIMARY KEY (`instance_id`,`start_ts`),
  KEY `start_ts` (`start_ts`)
);

That means our time resolution for queries is the length of time it takes QAN to process a batch of queries. We could improve this to 1 second by making use of the timestamp information.

To maintain behaviour similar to what happens currently, the query should be logged with start_ts of (SET TIMESTAMP + Query_time) listed in the slow log, rounded to nearest second (to evade the need to change start_ts column type. Since the timestamp in the slow log only has resolution of 1 second, switching to timestamp(6) would only improve real resolution by approximately half a second (because we would only get the fraction part of the Query_time), and that small improvement doesn't seem to justify the cost of converting potentially tens or hundreds of gigabytes of QAN tables.

Column name start_ts is misleading since this logs when the query was loaded, which is after it finished rather than when it started. But we can ignore this for now.