Open wiebeytec opened 6 months ago
I am not able to repro this in v19. I ran the end to end test TestReferenceRouting
in go/test/endtoend/vtgate/queries/reference
. I commented out the teardown // defer clusterInstance.Teardown()
. Based on this test setup, I believe the similar queries to what you are running are being planned correctly. Can you check and see what is different in this setup as compared to yours, so we can repro your failure. In particular, what is the value of require_explicit_routing
in your vschema?
mysql> vexplain queries select d.id, z.id from sks.delivery_failure d inner join zip_detail z on d.zip_detail_id = z.id where d.id = 1;
+------+----------+-------+---------------------------------------------------------------------------------------------------------+
| # | keyspace | shard | query |
+------+----------+-------+---------------------------------------------------------------------------------------------------------+
| 0 | sks | -80 | select d.id, z.id from delivery_failure as d, zip_detail as z where d.id = 1 and d.zip_detail_id = z.id |
+------+----------+-------+---------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
rohit@RSs-Laptop reference % vttablet --version
vttablet version Version: 19.0.4-SNAPSHOT (Git revision cc64915d0f9c06bd498026c301bdfbca5fd6afc5 branch 'release-19.0') built on Tue Apr 23 11:37:51 CEST 2024 by rohit@RSs-Laptop.local using go1.22.2 darwin/arm64
rohit@RSs-Laptop reference % vtctldclient --server localhost:8106 GetVSchema sks
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash",
"params": {},
"owner": ""
}
},
"tables": {
"delivery_failure": {
"type": "",
"column_vindexes": [
{
"column": "id",
"name": "hash",
"columns": []
}
],
"auto_increment": null,
"columns": [],
"pinned": "",
"column_list_authoritative": false,
"source": ""
},
"zip_detail": {
"type": "reference",
"column_vindexes": [],
"auto_increment": null,
"columns": [],
"pinned": "",
"column_list_authoritative": false,
"source": "uks.zip_detail"
}
},
"require_explicit_routing": false,
"foreign_key_mode": "unspecified"
}
My sites2023
has require_explicit_routing: false
. But, when I'm going to need unspecified mode, I see no way but to have that true
, I think? Because otherwise, as soon as I create another VSchema with table names, the routing will fail. This would block me from transparantly using MoveTables
, as tables exist in two keyspaces at the same time.
Does require_explicit_routing
indeed also affect reference table decisions?
I compared my situation to the test case, and it seems mostly the same, except that I have more data; I fill the reference tables with Materialize
.
Perhaps my issue is that global routing / unspecified mode is not functioning. I don't know why. If I look at the SrvKeyspace, it all looks good. I also tried rebuilding it. What else can I look at for that?
BTW, this is the vexplain
of the query:
{
"OperatorType": "Join",
"Variant": "Join",
"JoinColumnIndexes": "L:0,R:0",
"JoinVars": {
"lld_idDataAttribute": 1
},
"TableName": "lastLogData_dataAttributes",
"Inputs": [
{
"OperatorType": "Route",
"Variant": "EqualUnique",
"Keyspace": {
"Name": "sites2023",
"Sharded": true
},
"FieldQuery": "select lld.idSite, lld.idDataAttribute from lastLogData as lld where 1 != 1",
"Query": "select lld.idSite, lld.idDataAttribute from lastLogData as lld where lld.idSite = 6",
"Table": "lastLogData",
"Values": [
"6"
],
"Vindex": "a_standard_hash"
},
{
"OperatorType": "Route",
"Variant": "Unsharded",
"Keyspace": {
"Name": "legacy",
"Sharded": false
},
"FieldQuery": "select da.`code` from dataAttributes as da where 1 != 1",
"Query": "select da.`code` from dataAttributes as da where da.idDataAttribute = :lld_idDataAttribute",
"Table": "dataAttributes"
}
]
} |
I will keep trying things in the mean time.
I think I understand why you can't use unspecified mode. You have the same table name in legacy
and in sites2023
. How did this come about? Usually, once tables have been moved out of an unsharded keyspace into a sharded one, they are expected to be dropped from the old keyspace.
Even in that case, we can expect that there is a non-zero duration between switching traffic and dropping the source, but routing rules will be present to route queries correctly.
@rohit-nayak-ps this is probably the root cause for why things aren't working "as expected".
(comment edited, because I was confusing things)
I think I understand why you can't use unspecified mode. You have the same table name in legacy and in sites2023. How did this come about? Usually, once tables have been moved out of an unsharded keyspace into a sharded one, they are expected to be dropped from the old keyspace.
Because dataAttributes
is a reference table, copied to sites2023
. But, no table from legacy
can be addressed as global routing, also not the ones that were never moved. The table lastLogData
, which is currently in both keyspaces because the cluster is in switch traffic
mode, actually is globally addressable. Tables from meta2023
as well. But, no table from legacy
is.
Because dataAttributes is a reference table, copied to sites2023
Duh, of course. Will let Rohit keep looking into this :)
I think I have some stale information about legacy
somewhere, I just cannot find it. It may be causing you to chase red herrings. I don't know if you think it's worth it, or would like to remove all data, including etcd?
I figured it out: the vschema for the legacy
needs to have all reference tables from from sites2023
in it. As soon as I leave even one out, doesn't matter which one, it will route back to legacy
. I could have made a better repro case there, sorry. It was too far off my radar as a possibilty. So in the test case, sks
needs an extra reference table that is not in the vschema uks
, yet is in the db itself. Probably something like:
diff --git a/go/test/endtoend/vtgate/queries/reference/main_test.go b/go/test/endtoend/vtgate/queries/reference/main_test.go
index 4c9440ca4f..28cc709d4b 100644
--- a/go/test/endtoend/vtgate/queries/reference/main_test.go
+++ b/go/test/endtoend/vtgate/queries/reference/main_test.go
@@ -98,6 +98,10 @@ var (
"type": "reference",
"source": "` + unshardedKeyspaceName + `.zip_detail"
}
+ "does_not_exist": {
+ "type": "reference",
+ "source": "` + unshardedKeyspaceName + `.does_not_exist"
+ }
}
}
`
It also goes as I expected when I qualify the sites2023
keyspace, either as USE
or when qualifying sites2023.lastLogData
.
I can work around it, but it seems to me it's important to fix, because if I add extra reference tables in the future without adding to the vschema, I will create that scenario again.
One extra problem though: I get a panic in vtgate when I use unspecified mode in my application now. I will discuss/report that separately.
I can work around it, but it seems to me it's important to fix, because if I add extra reference tables in the future without adding to the vschema, I will create that scenario again.
Without adding it to the vschema how can Vitess know about a new reference table? Maybe I am misunderstanding. Can you clarify with an example what you expect to work here?
One extra problem though: I get a panic in vtgate when I use unspecified mode in my application now. I will discuss/report that separately.
I see you logged the stack trace in Slack ...
FYI we have another issue we are looking at, which could impact the routing of reference tables for you as well: https://github.com/vitessio/vitess/issues/15770.
Without adding it to the vschema how can Vitess know about a new reference table? Maybe I am misunderstanding. Can you clarify with an example what you expect to work here?
Isn't that change to the test exactly that? I'm saying that if there is some reference table defined in the sharded keyspace that is not in the unsharded vschema, all reference tables stop working properly.
Edit: See this comment for the real cause. The routing breaks for all reference tables if not all reference tables are present in the unsharded vschema.
Overview of the Issue
Reference table routing may not work as documented. If I don't explicitely name the copy of a table inside the sharded keyspace in a
SELECT ... JOIN
, it uses the currentUSE
keyspace. This results in the amount of queries scattered multiplying. When I use 'global routing' / 'unspecified mode', it also uses the source table, not the reference, with the same result.Expected result: at least for global routing mode, I expected the reference table to be used. And the documentation somewhat suggests that even when addressing the source table or having a
USE
keyspace defined, that it reroutes:But granted, perhaps I'm reading my wishes into that. It would be really great if it worked that way though, because only using unspecified mode is really hard when you're moving tables between keyspaces (as they exist in multiple keyspaces at some point, breaking the routing).
Reproduction Steps
I have these (simplified) tables:
The table
dataAttributes
is copied tosites2023
withMaterialize
, and it's defined in the VSchema forsites2023
as reference:When I run this query:
with a default
USE
oflegacy
(the defaultUSE
is the argument to themysql
command, which will affect the tabledataAttributes
), you can see it multiplies many queries to thelegacy
keyspace:But when I address the
dataAttributes
table directly insites2023
:The result is correct:
If I don't specify a default DB with
USE
, and don't fully quailfydataAttributes
, it says:I don't know at this point why my 'global routing' doesn't work, but even if I fix it by adding this vschema to
legacy
:it still picks the table from the
legacy
keyspace. To demonstrate, this is again the query with an unqualifieddataAttributes
:And I run this without a DB argument to
mysql
. It still erroneously goes tolegacy
:Binary Version