hugowan / maatkit

Automatically exported from code.google.com/p/maatkit
0 stars 0 forks source link

Add query review functionality #130

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
mk-query-review

Basic functionality:  Do a mysqlsla-type analysis of general or binary logs
(5.1, file or table) or the running live server (again, log table, or
intercept queries...)....for each query you see, check a storage location
-- if the query fingerprint does not already exist in the storage location
(file?  database?), add it.

The first version should have the functionality of "approving" queries. 
The idea is to have a list of all the queries that run against the database
system, and new queries can be looked at and approved or left in the "needs
approval" state.

Future versions:
auditing capabilities -- who approved the query and when, how often queries
happen (also can be used to see how often a table's query cache is
invalidated; which tables are used most heavily; if the queries against a
particular table would be optimized if the table were a different storage
engine; whether or not an index is used).

output examples and EXPLAIN, as mysqlsla can.

it can be used to track timings -- ie if a query is used a lot, then not
used at all (ie, a bad query was optimized), then used again (ie, change
reverted).  It can also be used to graph table usage, frequency of table
writes vs. reads...

(also all sorts of other monitoring such as GRANT statements, DDL, etc)

The first problem I would like for this to solve is when a client has a
"new release" and wants us to give it the thumbs-up or thumbs-down when we
don't have access to the code, the code repository, etc.  It would be great
to say "what new queries are there?"

Ideas:  wrapper around mysqlsla...mysqlproxy...(probably wrapper around
mysqlsla so it can be used w/out proxy)

I am OK with it being for 5.1 only.

(Feel free to contact me for clarification; Daniel said I should make this
an issue).

Original issue reported on code.google.com by awf...@gmail.com on 16 Nov 2008 at 9:53

GoogleCodeExporter commented 9 years ago
I am ok with this being for the slow-log only because that works fine with our
patches, which will suit our customers' needs very well.

This is an idea we have also been kicking around for a while.  It may be part 
of a
much larger system that will be used for figuring out all kinds of things, like
whether some systems are different from others, or whether a query suddenly 
gets run
more or takes a greater % of resources than before -- any kinds of outliers.

We also have a TODO on the list -- check whether the slow log reports syntax 
errors
or not.  Funny how such a simple thing is unknown.

Stuff like outputting EXPLAIN is a related issue but probably belongs 
elsewhere. 
EXPLAIN is actually unsafe to run automatically because of subqueries, though 
that
can be mitigated.  We also need a way to show the differences in EXPLAIN, so 
even if
the query doesn't change, maybe its EXPLAIN does.  How to diff this is an 
unsolved
problem.

I have also been thinking about some kind of fingerprint-of-a-fingerprint to
characterize a whole system.  What kind of metrics go into this?  Well, the
distribution of queries, read/write ratio, etc.  If this can be consolidated 
into
some kind of dimensionless value then it becomes easy to see differences from 
one
system to another, or one day to another.  Slowly-changing metrics are another
problem; it's easy to find outliers when they bust a trend, but if the trend 
towards
slow growth is steady, you might not notice.

Original comment by baron.schwartz on 18 Nov 2008 at 1:47

GoogleCodeExporter commented 9 years ago
Actually I think this is best done as part of a bigger tool.  Please share your 
ideas
about this at
http://code.google.com/p/maatkit/wiki/IntegratedMonitoringAndAnalysisSolution 
Note
that although this is "public", please don't blog about this.

Original comment by baron.schwartz on 24 Nov 2008 at 12:06

GoogleCodeExporter commented 9 years ago
On second thought I think Sheeri is right, we need to do this as a separate 
tool.  It
can be in the uber-system too, but slow-log and general-log are also important 
to get
solved now.  I would also like to see it possible to consume an arbitrary input
stream, like stdout from Proxy, or stdout from a packet sniffer or processlist.

Original comment by baron.schwartz on 24 Nov 2008 at 6:27

GoogleCodeExporter commented 9 years ago
I have committed r2545, which includes functionality that should permit you to 
do
some of what you requested.  Basically this lets you see what queries the system
runs.  Do it like this:

CREATE TABLE `review` (
  `checksum` bigint(20) unsigned NOT NULL,
  `fingerprint` text,
  `sample` text,
  `first_seen` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`checksum`)
);

mk-log-parser /path/to/logfile --review h=127.0.0.1,D=test,t=review

This inserts the query into the table the first time it's seen.

After reviewing the feature list again, I do not agree that all of this stuff 
should
be in there -- everything after "Future versions:" in the initial description 
of this
issue is beyond the scope of a single tool.  But you can easily add columns to 
the
table to do whatever you want.  For example, the customer who is sponsoring this
change is going to add a JIRA_ticket column, and an is_reviewed column.

This is initial stab at the problem and there are known TODOs, such as getting 
the
query's timestamp.

Original comment by baron.schwartz on 28 Nov 2008 at 4:17

GoogleCodeExporter commented 9 years ago
This is not yet well documented, but mk-log-parser trunk SVN version has good
functionality for this now.  Once this is documented in the POD, this issue is 
ready
to close.

Original comment by baron.schwartz on 12 Dec 2008 at 8:21

GoogleCodeExporter commented 9 years ago
Features targeted for query review improvements

Original comment by baron.schwartz on 16 Dec 2008 at 9:27

GoogleCodeExporter commented 9 years ago
Sweet!  Yay!

Original comment by awf...@gmail.com on 16 Dec 2008 at 10:05

GoogleCodeExporter commented 9 years ago
As of mk-log-parser r2655 this is mostly done and documented in the POD. Look at
issue 162 and issue 167 for more related info.

Original comment by dan...@percona.com on 24 Dec 2008 at 6:08