hugowan / maatkit

Automatically exported from code.google.com/p/maatkit
0 stars 0 forks source link

Implement a once-through, regex-free, simple state machine query fingerprinter #133

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
And then do some performance tests to see if/how much faster such a
fingerprinter is compared to a current regex-based solution.

Original issue reported on code.google.com by dan...@percona.com on 21 Nov 2008 at 8:48

GoogleCodeExporter commented 9 years ago
Note that I profiled mk-log-parser once upon a time and found

# %Time ExclSec CumulS #Calls sec/call Csec/c  Name
#  49.5   34.17 34.177 203249   0.0002 0.0002  QueryRewriter::fingerprint
#  46.7   32.26 68.797 203250   0.0002 0.0003  LogParser::parse_event
#  3.42   2.357 36.534 406498   0.0000 0.0001  main::__ANON__

Pretty much anything we can do to improve fingerprint() is worth trying.  The 
most
expensive lines in fingerprint() are ones that catch cases other tools don't 
catch --
uncommon floating point formats, nested quotes with backslashes, etc.  So 
correctness
is expensive, and we may debate how much correctness and performance can be 
traded off.

Original comment by baron.schwartz on 24 Nov 2008 at 12:10

GoogleCodeExporter commented 9 years ago
It might also be a good idea to do it a token at a time:

foreach my $tok ( $query =~ m/..../g ) {

The token would be quoted strings or space-delimited words.  This might be 
faster. 
Each word could then be transformed.

Right now the regex to recognize floats is by far the slowest part of the code.

Original comment by baron.schwartz on 14 Dec 2008 at 10:47

GoogleCodeExporter commented 9 years ago
I think the changes made to resolve issue 137 are enough at this point.  The
fingerprinter profiles out to about the same speed as mysqldumpslow's regexes, 
and
catches a hell of a lot more cases, so I'm not too bothered about it anymore.

Original comment by baron.schwartz on 25 Dec 2008 at 11:21

GoogleCodeExporter commented 9 years ago

Original comment by baron.schwartz on 25 Dec 2008 at 11:21