Closed mcrauwel closed 9 months ago
@mcrauwel we would only care about vtctldclient
for new features going forward. So it would be about making this a variable in the create command. This info is then stored in the json report in the vdiff_table sidecardb table.
Hey @mattlord, would you just override that maxVDiffReportSampleRows
value and store potentially the entire diff in the sidecar database (can be huge), or would we rather add an option like --store-mismatching-primary-keys
and create a new column in that table in the sidecar database that just stores the PK's?
I would be storing something like:
{
"key": {
"name": "<keyname/PRIMARY>",
"fields": [
<index-fields>
]
},
"values": [
{
"source": [...],
"target": [...]
},
{ ... }
]
}
missing rows on source or target would be null
in the relevant values...
in most cases though source and target would be the same in values
A JSON column in MySQL can be up to 1GiB. And you can already specify that you only want the PK values in the diff report using the --only-pks
flag: https://vitess.io/docs/18.0/reference/programs/vtctldclient/vtctldclient_vdiff/vtctldclient_vdiff_create/
So I would start with simply making that max sample rows variable.
makes sense! and in the docs we can recommend using --only-pks
flag when increasing the value for max_sample_rows
should I add an option all
or treat max_sample_rows = 0
as no limit, being all rows?
Call the flag something more descriptive please. --max-vdiff-report-rows
? I think 0 is fine to specify all
, and the current value (10) can be the default.
Sorry for the delay. vdiff is unnecessary as that's the command the flag is for. But something like --max-report-sample-rows
is good. And as @deepthi said, the 0 for unlimited is good (we use that pattern in various places).
Feature Description
Currently we use VDiff to find differences in 2 version of a table being migrated through a MoveTables workflow. The tool reports if there are differences and if there are how many. The tool also provides a sample of the mismatched rows but this is limited to 10 rows. So if there are more than 10 rows difference we'd need to use other tools (such as
pt-table-checksum
) to re-diff the table chunks that are different. Afterwards we can re-run thept-table-checksum
with--chunk-size=1
to find the individual rows that are different, whileVDiff
could have already told us...I would like to have an option for VDiff to print/save the PK's (or the index used for diffing the rows) with mismatching rows, so these could be used for reconciling the differences between the 2 tables...
I am ok with contributing code for implementing this, however it would be good to get the Vitess team's insights to identify the correct path forward for implementing this, such as where would you store/print this info? And what format you'd store it in...
Use Case(s)
This would allow the
VDiff
option to be used like a combination ofpt-table-checksum
andpt-table-sync