Open aquarapid opened 4 years ago
Hi @aquarapid! Stumbled across this issue while trying to understand reference tables a bit better. I have a question around:
you should insert on a per-shard level, or use vreplication to populate the tables.
How would we insert on a per-shard level to a reference table? My use-case is such that I want "sparsely" populated reference tables, so would ideally like to write on-demand to a shard if the reference data is missing there. But from the docs, I can't really figure out if there is a way to tell Vitess to write to a particular shard?
@usmanm in the above example, inserting into the specific shard would mean USE
-ing the specific shard, and then inserting. Assuming the keyspace name in the example above is keyspace1
, it would look like this:
mysql> use keyspace1/0;
Database changed
mysql> insert into testing (`version`,`inserted_at`) VALUES (20200108224539, "2020-01-16 23:31:51");
Query OK, 1 row affected (0.01 sec)
Note that populating reference tables like this manually isn't recommended. The original idea is to use a separate keyspace where the "source" of your reference tables lives. This keyspace would be probably unsharded, since reference tables are assumed to be small. You setup vReplication to materialize them continuously from that keyspace to the shards of your sharded "target" keyspace. That way, consistency would not be a concern.
Generally, reference tables shouldn't be directly updated. What you should instead do is create the source in an unsharded keyspace and vreplicate from that into the sharded one. Then you just update the source table.
I just remembered this demo shows how reference tables should be configured. It's a little outdated, but the general idea is still the same. https://youtu.be/E6H4bgJ3Z6c?t=1451
The thing I'm not sure about VReplication is that does it require us to remove referential integrity? Suppose if I want something like:
# This is a reference table (so I want it to be replicated to all shards)
CREATE TABLE users (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(256) NOT NULL
);
# This is in a sharded keyspace
CREATE TABLE events (
id INT NOT NULL PRIMARY KEY # Primary VIndex on this column
data VARCHAR(256)
user INT REFERENCES users(user_id)
);
There's some propagation delay between writing to the unsharded table and then replicating it to all shards of a keyspace? In that case wouldn't it be possible that I try to write something to events
and it fails because the referenced user data has not been propagated to the shard?
VReplication typically applies changes to the target within milliseconds. Essentially, it should be no worse than the application trying to write the change to all shards. But it guarantees eventual consistency even if there are failures, which is a complex problem to solve at the app level.
The way I would recommend implementing the feature is to have a function write to the source, and wait for the row to appear in the target before returning success.
Some users have requested that vitess itself do this part. It's something we've been considering.
That makes sense! And "waiting" for row to appear would is running just a SELECT
query or does Vitess provide some out-of-the-box thing here?
Another question that I had (tangentially related) is around availability. When using reference tables, is the availability == the unsharded keyspace MySQL instance to be up? In the sense that for the event
table, I can achieve better availability by just retrying and writing it to a different shard (id
is generated application side, so we have a bit of control here).
There are a couple of convoluted ways:
We can also build a custom construct to avoid the above convolution. Or, we can prioritize the wait functionality in vitess itself.
About availability: The master-replica setup with an automated failover mechanism has been serving multiple organizations with five nines of availability. So, it's fine to rely on its uptime.
Find out the target shard and send a shard-targeted query.
How would one go about doing this? I tried looking at the docs, but couldn't find a way to find the target shard given a keyspace ID or a value for a column which the Primary Vindex is on.
You can issue this query, which will return the list of shards:
mysql> show vitess_shards like 'customer/%';
+--------------+
| Shards |
+--------------+
| customer/-80 |
| customer/80- |
+--------------+
2 rows in set (0.00 sec)
Cache this info in the app. And then you can target a shard with:
mysql> use customer/-80;
mysql> select * from customer;
To find a shard from a keyspace id, you can mimic this function: https://github.com/vitessio/vitess/blob/1310711d44c1c48b1cc17c088659ad9eaf872aa9/go/vt/key/destination.go#L327-L338
This is obviously clumsy. For now, I hope you can hack this to make sure things work. We can find a more elegant way out of this later.
Thanks so much @sougou, this has been extremely helpful 🙏
Just to confirm this is still an issue in 9.0 (daa6085); with updated stacktrace:
runtime error: index out of range [0] with length 0
/opt/hostedtoolcache/go/1.15.7/x64/src/runtime/panic.go:88 (0x436d64)
/home/runner/work/vitess/vitess/go/vt/vtgate/engine/insert.go:397 (0xbface4)
/home/runner/work/vitess/vitess/go/vt/vtgate/engine/insert.go:255 (0xbf829c)
/home/runner/work/vitess/vitess/go/vt/vtgate/engine/insert.go:201 (0xbf77e6)
/home/runner/work/vitess/vitess/go/vt/vtgate/plan_execute.go:175 (0xcd88ba)
/home/runner/work/vitess/vitess/go/vt/vtgate/plan_execute.go:155 (0xcb3d0e)
/home/runner/work/vitess/vitess/go/vt/vtgate/plan_execute.go:116 (0xcb349e)
/home/runner/work/vitess/vitess/go/vt/vtgate/executor.go:187 (0xca129a)
/home/runner/work/vitess/vitess/go/vt/vtgate/executor.go:155 (0xca0ec4)
/home/runner/work/vitess/vitess/go/vt/vtgate/vtgate.go:266 (0xcd14d7)
/home/runner/work/vitess/vitess/go/vt/vtgate/plugin_mysql_server.go:219 (0xcb53db)
/home/runner/work/vitess/vitess/go/mysql/conn.go:1234 (0x902b8b)
/home/runner/work/vitess/vitess/go/mysql/conn.go:1219 (0x9026b5)
/home/runner/work/vitess/vitess/go/mysql/conn.go:867 (0x8ff747)
/home/runner/work/vitess/vitess/go/mysql/server.go:474 (0x921d77)
/opt/hostedtoolcache/go/1.15.7/x64/src/runtime/asm_amd64.s:1374 (0x472180)
@aquarapid should we close this? It seems to have gone stale and I'm unsure if it's still relevant or not. Thanks!
Scenario:
Sharded keyspace
Use a vschema like this with a reference table:
Create a table via vtgate:
Now, try to insert to the keyspace (not a shard-specific one):
The panic from the vtgate log file:
vtgate -version:
Obviously, reference tables don't work this way, and you should insert on a per-shard level, or use vreplication to populate the tables. However, inserting to it should not cause a vtgate panic and kill the connection.