vitessio / vitess

Vitess is a database clustering system for horizontal scaling of MySQL.
http://vitess.io
Apache License 2.0
18.73k stars 2.1k forks source link

Feature request: 'multi-shard or vindex write statement' with reference tables should/could work #14493

Closed wiebeytec closed 11 months ago

wiebeytec commented 1 year ago

Overview of the Issue

I'm not sure if this is a bug report or feature request.

When happily setting up our Vitess cluster and testing our app on it, I ran into (undocumented) limitations on joining with UPDATE and DELETE. If the following does not apply, perhaps MySQL compatability needs updating to describe this?

When one does a DELETE with JOIN to sharded data, it says: VT12001: unsupported: multi-shard or vindex write statement. However, I think joining to reference tables should actually be possible?

At first glance, I thought a query like the following should work even with sharded tables:

DELETE users
FROM users 
INNER JOIN world_data 
ON users.world_id = world_data.world_id 
WHERE world_data.something = 'end of the world';

However, I can understand this becomes a problem when you invert your ON condition, for instance.

But, for reference tables, deleting (and updating) with JOINs should be possible?

My work-around for now is composing a string with IDs to be used in the query.

Reproduction Steps

I started three tablets with a sharded and unsharded keyspace:

mysql> show vitess_tablets;
+------------------+-------------------------+-------+------------+---------+-----------------------------+---------------+----------------------+
| Cell             | Keyspace                | Shard | TabletType | State   | Alias                       | Hostname      | PrimaryTermStartTime |
+------------------+-------------------------+-------+------------+---------+-----------------------------+---------------+----------------------+
| vic-eu-central-1 | playground_sharded      | -80   | PRIMARY    | SERVING | vic-eu-central-1-0000000801 | 172.31.27.190 | 2023-11-08T10:38:26Z |
| vic-eu-central-1 | playground_sharded      | 80-   | PRIMARY    | SERVING | vic-eu-central-1-0000000802 | 172.31.27.190 | 2023-11-08T10:38:32Z |
| vic-eu-central-1 | playground_unsharded    | 0     | PRIMARY    | SERVING | vic-eu-central-1-0000000800 | 172.31.27.190 | 2023-11-08T10:37:30Z |
+------------------+-------------------------+-------+------------+---------+-----------------------------+---------------+----------------------+

VSchema:

cat bug_report_inner_join_delete.vschema 
{
   "sharded" : true,
   "tables" : {
      "world_data" : {
         "source" : "playground_unsharded.world_data",
         "type" : "reference"
      },
      "users" : {
         "column_vindexes" : [
            {
               "column" : "user_id",
               "name" : "a_standard_hash"
            }
         ]
      }
   },
   "vindexes" : {
      "a_standard_hash" : {
         "type" : "hash"
      }
   }
}

Apply:

./vtctldclient ApplyVSchema --vschema-file bug_report_inner_join_delete.vschema playground_sharded

I didn't actually create any tables, because it's not relevant to prove the case. Normally I would populate the reference table with a Materialize workflow.

On the gate:

mysql> delete users from users inner join world_data on users.world_id = world_data.world_id where world_data.something = 'end of the world';
ERROR 1105 (HY000): VT12001: unsupported: multi-shard or vindex write statement

Binary Version

vtgate --version
Version: 17.0.3 (Git revision dced85dd0f33b8bd95d37adb781d6cdc1949af92 branch 'HEAD') built on Tue Oct  3 14:47:40 UTC 2023 by runner@fv-az936-676 using go1.20.8 linux/amd64

Operating System and Environment details

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

Log Fragments

No response

deepthi commented 1 year ago

I'll address the documentation part of this. This is in fact documented, you just have to follow the link to unsupported queries from the page you are looking at. Direct link: https://github.com/vitessio/vitess/blob/main/go/vt/vtgate/planbuilder/testdata/unsupported_cases.json#L59

harshit-gangal commented 1 year ago

This is currently not supported for sharded keyspaces.

harshit-gangal commented 1 year ago

Related issue: https://github.com/vitessio/vitess/issues/12677

wiebeytec commented 1 year ago

I'll address the documentation part of this. This is in fact documented, you just have to follow the link to unsupported queries from the page you are looking at. Direct link: https://github.com/vitessio/vitess/blob/main/go/vt/vtgate/planbuilder/testdata/unsupported_cases.json#L59

That query joins to user_extra. In go/vt/vtgate/testdata/executorVSchema.json, user_extra is not a reference table.

deepthi commented 1 year ago

That query joins to user_extra. In go/vt/vtgate/testdata/executorVSchema.json, user_extra is not a reference table.

Good point. Given that we are supposed to be able to join with reference tables efficiently for selects, it is reasonable to ask why it doesn't work the same way for update and delete. @harshit-gangal both you and I are wrong. This is not the same as #12677. Can we look into how we plan queries that involve reference tables? maybe there is a somewhat quick way of extending what we do in SELECT planning to DMLs. cc @systay

harshit-gangal commented 1 year ago

We need to remodel our update and delete operators to incorporate multi-table, then it would not matter if it is a reference table or not.

wiebeytec commented 1 year ago

I'm not quite sure if you mean that makes it easy or hard, but just to be sure my 2 cents:

Are you talking about full support for multi-shard write, even when I invert the join condition? Like so:

DELETE users
FROM users 
INNER JOIN other_data_from_shard
ON users.user_id <> other_data_from_shard.user_id
;

Because this is not "just" a matter of executing the query on the shard (because it would delete way too much: edit: or, would it delete not enough 🤔), whereas doing it with reference tables is, I think?

harshit-gangal commented 1 year ago

Once the planning work is undertaken for this. It should work as expected for the query above based on the join/where condition.