Closed gordan-bobic closed 5 months ago
Bumping this to v4 release: 1) It doesn't look like an enormously complicated task 2) We are currently blocked by the build-split re-pipeline 3) I'm getting a lot of requests for this feature from multiple clients
The user source panel is added like this, at the bottom of the QAN page, one line per user@ip item.
And the table definition of the table for storing user@ip data is
CREATE TABLE IF NOT EXISTS query_user_sources (
query_class_id INT UNSIGNED NOT NULL,
instance_id INT UNSIGNED NOT NULL,
ts TIMESTAMP(6) NOT NULL,
user VARCHAR(128) CHARSET 'utf8' NOT NULL,
host VARCHAR(255) CHARSET 'utf8' NOT NULL,
PRIMARY KEY (query_class_id, instance_id, ts, user, host)
);
I'm thinking whether we should do some aggregation to the user@ip data parsed from slow query log, if we store user@ip of every query logged in the slow query log file, would that be too much?
Great, thanks. :-)
This list could get way too long to sensibly manage / load / show for a query that runs 100K times per second when zooming out to 7 days. . Can you do it by summarising using something like:
SELECT user, host, min(ts), max(ts), count(1), instance_id, query_class_id
FROM query_user_sources
WHERE ts >= $start and ts <= $end AND
instance_id IN (...)
GROUP BY query_class_id, instance_id;, user, host;
I see, that's a good way to show them, will change it that way.
OK, is this idea?
Yes, that's great!
Is similar information available when harvesting query data from performance_schema?
I don't think we can get such infomation from performance_schema. With performance_schema harvesting, data are mainly fetched from events_statements_summary_by_digest
table, and there is no user@ip info in it.
There is a events_transactions_summary_by_account_by_event_name
table which includes user@ip info, and the rows look like this
USER: root
HOST: localhost
EVENT_NAME: statement/sql/select
COUNT_STAR: 121
SUM_TIMER_WAIT: 36497717000
MIN_TIMER_WAIT: 79940000
AVG_TIMER_WAIT: 301634000
MAX_TIMER_WAIT: 4525424000
SUM_LOCK_TIME: 248000000
SUM_ERRORS: 0
SUM_WARNINGS: 0
SUM_ROWS_AFFECTED: 0
SUM_ROWS_SENT: 8
SUM_ROWS_EXAMINED: 8
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 0
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_FULL_RANGE_JOIN: 0
SUM_SELECT_RANGE: 2
SUM_SELECT_RANGE_CHECK: 0
SUM_SELECT_SCAN: 2
SUM_SORT_MERGE_PASSES: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 0
SUM_SORT_SCAN: 0
SUM_NO_INDEX_USED: 2
SUM_NO_GOOD_INDEX_USED: 0
SUM_CPU_TIME: 0
MAX_CONTROLLED_MEMORY: 50320
MAX_TOTAL_MEMORY: 69459
COUNT_SECONDARY: 0
It's already aggregated and there seems no way to connect it with events_statements_summary_by_digest
table.
Fair enough. I think this is already harvested for the user statistics dashboard.
I am increasingly getting requests to capture what user/ip particular queries come from. This is in the slow log, so we could capture it. This should be captured and aggregated so that for any selected time period we can establish what set of user@ip sources a particular query has been coming from and show it as a list.