vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.63k stars 2.1k forks source link

Support Multi-Table Update and Delete syntax #2439

Closed bbeaudreault closed 7 years ago

bbeaudreault commented 7 years ago

I'm looking to support multi-table delete syntax. This is defined per the mysql spec here: http://dev.mysql.com/doc/refman/5.7/en/delete.html

Here's an example query:

DELETE sr, ais 
FROM suppressionRules sr 
LEFT JOIN active_incident_suppressions ais ON ais.suppression_rule_id = sr.id 
WHERE sr.id=498

So in this case we are always deleting from suppressionRules, but also deleting from active_incident_suppressions when there is a matching record from the join.

I've got an initial implementation of parsing the syntax in an experimental branch here: https://github.com/youtube/vitess/compare/master...HubSpot:delete_multi_support?expand=1. That diff creates a new DeleteMulti struct to hold these queries.

I began looking at building this into the plan building package, but wanted to verify with @sougou before going further. My thought is we could re-use much of the planbuilder logic, which is currently limited to Select statements, by making it generically applicable to Delete as well.

For the vttablet, I think we should be able to easily turn the above delete syntax into a select, for pre-identifying what rows and tables to delete from. Depending on the join type, we could see whether the result contains a null or a value for each table column we want to check on.

For example, here's the output of changing the above query with a simple transformation from:

DELETE sr, ais FROM

to

SELECT sr.id as `suppressionRules.id`, ais.id as `active_incident_suppressions.id` FROM

Full output:

mysql> select sr.id as `suppressionRules.id`, ais.id as `active_incident_suppressions.id` FROM suppressionRules sr LEFT JOIN active_incident_suppressions ais ON ais.suppression_rule_id = sr.id WHERE sr.id=498;
+---------------------+---------------------------------+
| suppressionRules.id | active_incident_suppressions.id |
+---------------------+---------------------------------+
|                 498 |                             422 |
|                 498 |                             948 |
|                 498 |                             949 |
|                 498 |                             950 |
+---------------------+---------------------------------+
4 rows in set (0.00 sec)

Depending on the join type used, the columns would contain real values (row exists to delete in table) or null values (no row to delete in table). For example, here's what the above looks like with a OUTER join, on a sr.id that does not contain a record in active_incident_suppressions:

mysql> select sr.id as `suppressionRules.id`, ais.id as `active_incident_suppressions.id` FROM suppressionRules sr LEFT OUTER JOIN active_incident_suppressions ais ON ais.suppression_rule_id = sr.id WHERE sr.id=43;
+---------------------+---------------------------------+
| suppressionRules.id | active_incident_suppressions.id |
+---------------------+---------------------------------+
|                  43 |                            NULL |
+---------------------+---------------------------------+
1 row in set (0.00 sec)
bbeaudreault commented 7 years ago

We're also interested in multi-table updates, as described here: http://dev.mysql.com/doc/refman/5.7/en/update.html

sougou commented 7 years ago

It should be relatively easy to allow these constructs in VTGate for unsharded keyspaces. We could even look at making them work for sharded keyspaces by reusing the routing logic for select statements.

However, vttablet could be a problem: It currently annotates DMLs with the table name and primary key rows that are affected. To support multiple tables, we have to first extend the format of the annotation as well as find a way to identify the affected rows for all tables. We should think of a way to do this.

Fortunately, there is a temporary work-around: You can turn off queryserver-config-strict-mode. Doing this will make vttablet pass-through the DMLs it doesn't understand. It also skips the critical verifications: STRICT_TRANS_TABLES and autocommit, which you have to manually verify and enforce.

Turning off queryserver-config-strict-mode also means that you cannot use downstream tools like filtered replication and update stream. Hopefully, we'll have this problem solved when you get to that point.

bbeaudreault commented 7 years ago

This is being tackled here: https://github.com/youtube/vitess/pull/2852

bbeaudreault commented 7 years ago

This is done