Open systay opened 1 year ago
I'm going to make the problem just a tad more complex. You should also be able to filter out (materialized) reference tables... They will otherwise show up in every keyspace...
I like everything suggested in the original comment.
There are more complex scenarios:
information_schema
, including JOINs and subselects. You need to potentially UNION ALL some of these? VIEW
that reads from information_schema
? Is that automatically covered by the above?If this is helpful, here are some complex examples for querying information_schema
. These are all described as views, but of course for most of them you can just extract the view's SELECT
query as reference:
etc.
@shlomi-noach, thanks for the links, I'll read up and make sure we support these.
- any custom query the user may run on
information_schema
, including JOINs and subselects. You need to potentially UNION ALL some of these?
This is not a problem. This is kind of what I was talking about under the "Merging of I_S views". The planner plans the routing per table listed in the SELECT query, and pushes down joins and subqueries to the same query if they can be merged, and otherwise does the joins and subqueries on the vtgate.
- What if the user creates a
VIEW
that reads frominformation_schema
? Is that automatically covered by the above?
Since we are handling views by just turning them into a derived table, this approach should work well for views as well.
Added two possible problems - MoveTables and reference tables
Love the proposal. I 💯 % agree with all the points.
In relational databases, the information schema (
information_schema
, or shorter,I_S
) is an ANSI-standard set of read-only views that provide information about tables, views, columns, procedures and more in a database system.Purpose of this RFC
This RFC will try to describe the intended approach for improved
I_S
routing, and point out problems with this approach. Specifically, I'm trying to solve two problems with todays solution:Multiple keyspaces. Since a Vitess system can have multiple keyspaces, and these can be backed by mysql instances that know nothing about the other keyspaces, no one keyspace will contain information about all tables in the Vitess system. It is not possible to get query results from multiple keyspaces today.
Database names. From the users perspective, a keyspace is equivalent to a mysql database. A keyspace named XXX can be backed by a mysql database named YYY. Today, we return the whatever the underlying mysql information_schema has, which might not be the same as the keyspace name.
Suggestion
Individual view routing
For every
I_S
view being queried, we need to figure out which of three situations we are dealing with.Approach depending on type of view
UNION
between the results from all keyspaces. This UNION is done on the vtgate, not sent down to any MySQL instance.Given a system that has two keyspaces (
ks1
andks2
). Example plan for query#3
above:In the query above, the first SELECT is sent to
ks1
, and it will contain information aboutks1
, but also about all the default databases installed by default:information_schema
,mysql
,performance_schema
andsys
.The second SELECT is sent to
ks2
, and now we are only interested in information aboutks2
. All the default stuff was already covered by the first query.Handling database name != keyspace name
In Vitess, you can use a different name for the MySQL database and the keyspace. Not uncommon is that the keyspace
user
is stored in the databasevt_user
. To make sure the user sees the keyspace name, we have to use a little logic.Given a query such as:
The user wants to see the contents of the
TABLE_SCHEMA
column, which will contain the database name and not the keyspace name. What we would do is, instead of just using thetable_schema
column, we construct aCASE
like so:This is the query we would use for the first of the SELECTs we UNION together. For the rest of them, we have the keyspace name already, and can inject it as a literal instead of using the value from the
I_S
:Merging of
I_S
viewsOne important decision the planner does is to decide which tables can be fetched together from a single
route
. This is called merging routes. My suggestion would be that we merge the first and second type of views, but not the third type of queries. To spell it out:Possible problems
I_S
query, all keyspaces have to be live and responding for the query to finish successfully. Can we assume that this is the case?MoveTable
, a table will appear in two different keyspaces