vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.47k stars 2.09k forks source link

fix: Avoid creating `MoveTables` in case of non-empty target tables #16826

Open beingnoble03 opened 4 days ago

beingnoble03 commented 4 days ago

Description

This PR adds a validation check for empty tables in the target keyspace while creating MoveTables workflow.

Related Issue(s)

Screenshot

Screenshot 2024-09-24 at 12 41 44 AM

Checklist

Deployment Notes

vitess-bot[bot] commented 4 days ago

Review Checklist

Hello reviewers! :wave: Please follow this checklist when reviewing this Pull Request.

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

codecov[bot] commented 1 day ago

Codecov Report

Attention: Patch coverage is 87.30159% with 8 lines in your changes missing coverage. Please review.

Project coverage is 69.45%. Comparing base (95f2e3e) to head (8393dea). Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vtctl/workflow/utils.go 85.71% 6 Missing :warning:
go/vt/vtctl/workflow/materializer.go 90.47% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #16826 +/- ## ========================================== - Coverage 69.51% 69.45% -0.06% ========================================== Files 1569 1571 +2 Lines 202517 203121 +604 ========================================== + Hits 140780 141083 +303 - Misses 61737 62038 +301 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

beingnoble03 commented 1 day ago

I am working on the failing TestVtctlMigrate e2e test.

rohit-nayak-ps commented 1 day ago

There is one issue already in the existing code: We get the list of tables just for one shard in getTablesInKeyspace() However each target shard might have a different set of tables which exist. If the selected shard has the table, but other shards not, then we will end up not creating the target tables in some shards.

This extends to the new check we are adding. If a table is present in the first code then we will also check for data in the table in other shards. If the other shards don't have that table then the check for data will fail with a "table not found"

In addition, the getSchema() call, made while trying to getTablesInKeyspace() that we make is a heavy duty query, just to get the list of tables.

We may want to consider a different approach that fixes all the issues mentioned above:

For a workflow with 1000s of tables and 1000s of shards this will imply a large number of queries. We may want to add a --no-validations flag to bypass these checks in such cases. But since this is run during the Create step and with a weak isolation level, we may be fine without the flag for now. The flag might be a fail-safe though in case of some bug in this code, as pointed out above.

@noble, I suggest you wait before coding further on this until we reach an agreement on the final approach.

This approach avoids the issue of size of query/number of joined tables in a query that we have with the union approach. So that will also have to be batched and a query with several tables being joined even with a union may have performance impacts which might reduce the effect of the order of magnitude increase in queries.