hugowan / maatkit

Automatically exported from code.google.com/p/maatkit
0 stars 0 forks source link

mk-table-checksum should be able to re-checksum things that differ #69

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When there are differences that are resolved, it would be useful to be able
to re-checksum them.  For example,

mk-table-checksum --replicate=foo.checksum --replcheck 1 --recheck

should find chunks that differed on slaves and re-checksum them.

Original issue reported on code.google.com by baron.schwartz on 17 Sep 2008 at 1:12

GoogleCodeExporter commented 9 years ago

Original comment by baron.schwartz on 1 Oct 2008 at 3:11

GoogleCodeExporter commented 9 years ago

Original comment by baron.schwartz on 3 Oct 2008 at 12:45

GoogleCodeExporter commented 9 years ago
I'm working on this.

Original comment by dan...@percona.com on 8 Oct 2008 at 1:22

GoogleCodeExporter commented 9 years ago
It has to do like --replcheck, only instead of printing out what it finds, it 
then
goes back to the master and re-runs the checksum on the master with the same 
WHERE,
which it should pull out of the 'boundaries' column.  So --recheck requires
--replcheck so it knows how deep to recurse.

And I know on IRC I said "I think it shouldn't do this unless the slave is 
caught up
in replication" but scratch that -- let the user worry about it.

However, it DOES need to keep track and not re-run the same chunk more than 
once per
invocation.

Original comment by baron.schwartz on 8 Oct 2008 at 1:47

GoogleCodeExporter commented 9 years ago
r2408 has the beginnings of this feature. The trick now is doing
rechecksum_inconsistent_table(). When it is just --replicate, important stuff 
is done
in a few places in order to get all the necessary params to do_tbl_replicate(). 
For
example, the checksum query is made at line 3330. And that relies on having 
gotten
earlier the table struct at line 3204. Furthermore, do_tbl_replicate() wants a 
$hdr
param and currently that $hdr is hiding at line 3255, quite out of scope for us 
when
we're in check_repl_slaves() which calls rechecksum_inconsistent_table(). So, 
some
work is needed yet to rearrange the script internals to accomplish --recheck 
gracefully.

Original comment by dan...@percona.com on 8 Oct 2008 at 3:47

GoogleCodeExporter commented 9 years ago
I think when --recheck runs, it should build that list like you're doing.  But, 
then
instead of the call stack being

main -> check_repl_slaves -> rechecksum_inconsistent_table -> do_tbl_replicate

I think it might be better as

main -> check_repl_slaves

then do NOT exit on line 3065 as it currently does, instead, skip the whole 
$finder
thing on line 3149 and build a list of databases and tables to do, in the same 
sort
of way that mk-parallel-dump does.  Then, iterate over this instead of 
iterating over
$finder->find_databases() and $finder->find_tables().

If --recheck isn't given, then of course $finder is used to build a list of dbs 
and
tables.

There are surely other approaches, I'm just thinking that the thing to abstract 
here
is "find tables to do".  $finder is one way, $ms->recurse_to_slaves is another 
way.

Either way it's going to involve moving some code around, but not necessarily 
making
it more complex -- just a bit of refactoring.

Original comment by baron.schwartz on 8 Oct 2008 at 4:55

GoogleCodeExporter commented 9 years ago
r2410 has the code largely refactored towards this end. I think it needs further
refactoring; perhaps a new module/class/obj is in order as I have identified the
distinct tasks in the overall goal of checksumming the tables:

1) For each db (on the master), get the tables we want to checksum, gathering 
lots of
info about each table

2) For each table, figure out the best checksum strategy, in particular: how to 
chunk
the table

3) For each chunk, fork a child to do the actual checksumming work for the given
db-tbl-strategy-chunk

4) Report the checksum results

I'll make more headway on this tomorrow.

Original comment by dan...@percona.com on 9 Oct 2008 at 2:14

GoogleCodeExporter commented 9 years ago
This feature is in r2414. It's functional, but those subs with a lot of args 
might do
better to take named params. Or perhaps a module will become of all this, now 
that
it's more fully refactored.

Contrary to the comment in r2410, I did not make 2 calls to 
recurse_to_slaves(). I
felt that was an unnatural hack: any time a block of code is copy/pasted I'm
suspicious. I think the approach in r2414 is pretty elegant.

Original comment by dan...@percona.com on 10 Oct 2008 at 2:42

GoogleCodeExporter commented 9 years ago
I'm going to close this issue (mark it fixed at least). Baron: re-open it if you
don't agree with the code changes before release. This feature has been 
implemented
though, and it works, but as I noted above (comment 8) we'll probably want to 
cleanup
the sub calls later--make them use named params.

Original comment by dan...@percona.com on 16 Oct 2008 at 1:48