shatteredsilicon / ssm-submodules

GNU Affero General Public License v3.0
0 stars 2 forks source link

Capture user@source information from slow query log #187

Closed gordan-bobic closed 5 months ago

gordan-bobic commented 10 months ago

I am increasingly getting requests to capture what user/ip particular queries come from. This is in the slow log, so we could capture it. This should be captured and aggregated so that for any selected time period we can establish what set of user@ip sources a particular query has been coming from and show it as a list.

gordan-bobic commented 6 months ago

Bumping this to v4 release: 1) It doesn't look like an enormously complicated task 2) We are currently blocked by the build-split re-pipeline 3) I'm getting a lot of requests for this feature from multiple clients

oblitorum commented 5 months ago

image

The user source panel is added like this, at the bottom of the QAN page, one line per user@ip item.

And the table definition of the table for storing user@ip data is

CREATE TABLE IF NOT EXISTS query_user_sources (
  query_class_id  INT UNSIGNED NOT NULL,
  instance_id     INT UNSIGNED NOT NULL,
  ts              TIMESTAMP(6) NOT NULL,
  user            VARCHAR(128) CHARSET 'utf8' NOT NULL,
  host            VARCHAR(255) CHARSET 'utf8' NOT NULL,
  PRIMARY KEY (query_class_id, instance_id, ts, user, host)
);

I'm thinking whether we should do some aggregation to the user@ip data parsed from slow query log, if we store user@ip of every query logged in the slow query log file, would that be too much?

gordan-bobic commented 5 months ago

Great, thanks. :-)

This list could get way too long to sensibly manage / load / show for a query that runs 100K times per second when zooming out to 7 days. . Can you do it by summarising using something like:

SELECT   user, host, min(ts), max(ts), count(1), instance_id, query_class_id
FROM     query_user_sources
WHERE    ts >= $start and ts <= $end AND
         instance_id IN (...)
GROUP BY query_class_id, instance_id;, user, host;
oblitorum commented 5 months ago

I see, that's a good way to show them, will change it that way.

oblitorum commented 5 months ago

image

OK, is this idea?

gordan-bobic commented 5 months ago

Yes, that's great!

gordan-bobic commented 5 months ago

Is similar information available when harvesting query data from performance_schema?

oblitorum commented 5 months ago

I don't think we can get such infomation from performance_schema. With performance_schema harvesting, data are mainly fetched from events_statements_summary_by_digest table, and there is no user@ip info in it. There is a events_transactions_summary_by_account_by_event_name table which includes user@ip info, and the rows look like this

                       USER: root
                       HOST: localhost
                 EVENT_NAME: statement/sql/select
                 COUNT_STAR: 121
             SUM_TIMER_WAIT: 36497717000
             MIN_TIMER_WAIT: 79940000
             AVG_TIMER_WAIT: 301634000
             MAX_TIMER_WAIT: 4525424000
              SUM_LOCK_TIME: 248000000
                 SUM_ERRORS: 0
               SUM_WARNINGS: 0
          SUM_ROWS_AFFECTED: 0
              SUM_ROWS_SENT: 8
          SUM_ROWS_EXAMINED: 8
SUM_CREATED_TMP_DISK_TABLES: 0
     SUM_CREATED_TMP_TABLES: 0
       SUM_SELECT_FULL_JOIN: 0
 SUM_SELECT_FULL_RANGE_JOIN: 0
           SUM_SELECT_RANGE: 2
     SUM_SELECT_RANGE_CHECK: 0
            SUM_SELECT_SCAN: 2
      SUM_SORT_MERGE_PASSES: 0
             SUM_SORT_RANGE: 0
              SUM_SORT_ROWS: 0
              SUM_SORT_SCAN: 0
          SUM_NO_INDEX_USED: 2
     SUM_NO_GOOD_INDEX_USED: 0
               SUM_CPU_TIME: 0
      MAX_CONTROLLED_MEMORY: 50320
           MAX_TOTAL_MEMORY: 69459
            COUNT_SECONDARY: 0

It's already aggregated and there seems no way to connect it with events_statements_summary_by_digest table.

gordan-bobic commented 5 months ago

Fair enough. I think this is already harvested for the user statistics dashboard.