support interleaved tables

pingcap / tidb

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/

https://pingcap.com

Apache License 2.0

36.86k stars 5.8k forks source link

support interleaved tables #19404

Open zz-jason opened 4 years ago

zz-jason commented 4 years ago

Feature Request

Is your feature request related to a problem? Please describe:

Let's say we have customer and order tables:

create table customer (
    id bigint primary key,
    ...,
)

create table order (
    id bigint,
    customer_id bigint,
    primary key (customer_id, id)
)

A join query like this:

select * from order, customer on order.customer_id = customer.id and customer.id = ?;

It may require to access many disjoint regions of table order and customer to get all the customer data, which can cause a significant performance penalty.

Describe the feature you'd like:

After the clustered primary index is supported (https://github.com/pingcap/tidb/issues/4841), we can further support the interleaved table feature like Google Cloud Spanner and CockroachDB. With this feature, the data of customer can be encoded as follows:

key: same_table_id + customer_id
value: customer_data

the data of order can be encoded as:

key: same_table_id + customer_id + order_id
value: order_data

if customer and order share the same table id, we can make all the data belonging to a customer stored together in a TiKV region. Which reduces the cop requests and network overhead.

If all the regions are on the same tikv server, we can further push down the join to the tikv coprocessor with this optimization: https://github.com/pingcap/tidb/issues/19381.

References:

Cloud Spanner: Creating interleaved tables
CockroachDB: INTERLEAVE IN PARENT

Describe alternatives you've considered:

N/A

Teachability, Documentation, Adoption, Migration Strategy:

N/A

ilovesoup commented 4 years ago

This feature will be a very big one but with couple of limitations:

Parent table row count vs child row count difference cannot be too big. Otherwise the speedup is very unstable.
Read on parent table will be slowed down.
Point get on child table need extra index.
Very limited use cases with very high customer education cost.

mightyguava commented 4 years ago

As this is motivated trying to solve the same issues in https://github.com/pingcap/tidb/issues/19309, I think interleaved tables make quite a lot of sense. IMHO I disagree this has limited use cases. Many database schemas are centered around a "customer", or some other type of central entity, which lead to similar use cases. When you have these parent-child relationships in a relational database, and have data spanning many machines, it becomes important to have data locality. I think this is why Spanner heavily emphasizes interleaving as one of its features, and in its best practices.

A canonical example is an order management system where orders are interleaved in customers, when there is high order volume.

Using auto_random on orders would solve the hot tail shard issue, but make it expensive to list orders for a specific customer.

Interleaving orders into customers would achieve:

Write distribution when inserting orders.
Data locality when querying orders for a specific customer, improving performance on joins and range queries
Looks requiring joining on customers and their orders would be faster

Range queries across multiple customers, looking up only the customer, would slow due to have to scan more rows, but this is unlikely to be a common query.

I think the most interesting comparison with interleaving is actually clustered indexes, like

create table orders (
  id bigint auto_increment unique key,
  customer_id bigint,
  primary key (customer_id, id)
)

The clustered index and interleaved tables would perform similarly for inserts. For read, clustered_index would be better for point select (when customer_id is absent), same if customer_id is present, but worse when customer info needs to be retrieved with the order. Clustered index and interleaving feel similar in importance for an OLTP system at scale.

Parent table row count vs child row count difference cannot be too big. Otherwise the speedup is very unstable.

I don't quite understand this statement. Why would parent/child row count difference be a trade-off here?

tirsen commented 4 years ago

There are more many benefits of this feature. Here are a few more:

Queries with the customer id in the WHERE clause can execute fully inside a single tikv node.
As a corollary to the above point, joins across interleaved tables can be fully resolved on a single tikv node and doesn’t have to move any data across the network except the end result of the query.
Transactions on interleaved tables does generally not need full 2PC transactions (except for index updates, but if we support interleaved indexes we may be able to get around that too, see my next comment).

tirsen commented 4 years ago

As part of this feature we could also support interleaved indexes like this:

create table order (
    id bigint,
    customer_id bigint,
    some_interesting_column bigint,
    primary key (customer_id, id),
    key `some_interesting_column_idx` (`customer_id`,`some_interesting_column`),
) interleave in parent customers (customer_id)

If this index is interleaved with the parent customer it could also be updated entirely inside the same range.

tirsen commented 4 years ago

Very limited use cases with very high customer education cost.

I also humbly disagree. :-) This has become a standard feature of NewSQL databases as both Spanner, CockroachDB and (in another shape) Vitess has this. I think there will be a lot of industry knowledge of how to use this feature.

In a distributed database you don't have direct control over where data is stored. With interleaving you hint to the DBMS how you're going to access the data so that it places it in a better way.

This feature becomes even greater because of TiDBs globally ACID transactions. Often you have to trade off the placement of tables according to one access pattern, punishing another access pattern. For example, the customer order entry wants to access orders via customer but the logistics system wants to access orders via article. With TiDB you could imagine denormalizing the order data both interleaved with customer and interleaved with article optimizing for both access patterns!

ilovesoup commented 4 years ago

Thank so much you for the very helpful input! @tirsen @mightyguava

I fully agree that controlling data locality and placement is a very important thing for distributed databases. An effective way to achieve this goal will be very helpful. This feature will be a very big one and we have to be cautious about its cost-effectiveness. But we are very open for discussion. Understanding real-world use cases is extremely beneficial for us.

Here is some of my thoughts. Comparing to cluster index solution, the benefit of interleaved table is reducing the cost of 1 extra RPC round trip: locating orders from customer record. Using custid to retrieve data, the benefit further diminished since 2 RPCs themselves can be "interleaved" instead of sending in order thus hide some of the cost.

If the amount of rows in child table for each parent record is large, the extra RPC cost will be largely hidden, not to mention there are cases child records spills to other regions in other nodes. Also, in such cases, extra index filtering on orders will largely nullify the factor of interleaved table since we have only global index. Overall, these are what I mentioned as "limited use cases".

But interleaved index is something quite interesting and we haven't thought of. We had discussions about ways to achieve local index. This seems a natural way and cause no burden to users in such scenarios. Also it resolves the problems of filtering on orders.

Basically, since interleaved table might cost quite a bit man/month, it's very helpful to discuss it thoroughly about if and when.

mightyguava commented 4 years ago

I see. Yes, I think clustered primary index will address many of the performance issues we currently see. It's something I'd love to see backported into 4.0 or 4.1.

Interleaving is exciting that it goes a step beyond. I like the idea of being able to query a customer and its most recent orders querying only a single region. I really like the idea of being able to update a customer and insert an order without having to do a distributed 2PC. These are both common patterns in our database usage. Another powerful benefit of interleaving is that it naturally opens up a path for TiDB to support foreign key constraints.

While these optimizations assume that the customer's orders don't spill over to another region, given that regions are 100MB that seems like a pretty safe limitation. Maybe region size could also be increased if it becomes an issue? Spanner defaults 4GB and CockroachDB defaults to 512MB.

tirsen commented 4 years ago

There are two other benefits to interleaved tables that should not be disregarded:

Joins across multiple interleaved tables can be fully resolved in a single tikv node if the parent key is the same. This will have a huge benefit for our use case. We have a fairly normalized data model and our OLTP queries do joins across a few key tables very frequently but it does specify the customer id. Having to send the row data across between tikv nodes or from tikv node to tidb node may become a big resource consumer. Further testing will have to show how big of an issue this is in practice. We should hopefully be able to report back on this in a few weeks.o
Availability improvements. Something we've spent a lot of time working on is what we call "shard resiliency". If a single shard goes down we are still available to customers on other shards. In large systems this can have a really big impact on availability. This is critical especially since we may sometimes be financially responsible for lack of availability during certain times (for example during market hours for share trading).

We actually have some customers that have gigabytes of data. Having their data span 10s of regions is a lot better than having their data span 1000s of regions! The number of regions per customer is:

with affinity: O(number of tables * customer data size)
with interleaved tables: O(customer data size)
with just AUTO_RANDOM: O(total number of regions of table)

Thank you for taking this feature under consideration. We do realize it's a massive undertaking.

ilovesoup commented 4 years ago

First of all, 1PC together with interleaved index and shard resiliency are very sound reasons. I think we might have further discussion around these two and decide wether do it or not. BTW, are you interested in join us with in interleaved table development? If we decide to make it happen, your help will make it happen a lot soooner :)

Now please allow me further explain some of the points above.

While these optimizations assume that the customer's orders don't spill over to another region, given that regions are 100MB that seems like a pretty safe limitation. Maybe region size could also be increased if it becomes an issue? Spanner defaults 4GB and CockroachDB defaults to 512MB.

We currently still have difficulty to increase region size beyond 192M for the stableness of TiKV cluster. But in the future the limitation will lift.

We actually have some customers that have gigabytes of data. Having their data span 10s of regions is a lot better than having their data span 1000s of regions!

For customer data spanning multiple regions: in clustered index solution (use custid as prefix to original orderid and orders will gather under individual custid like in interleaved table), we also clusters orders data in couple of regions instead of a lot many. It's a very similar solution to interleaved one except customer table is not stored together.

Joins across multiple interleaved tables can be fully resolved in a single tikv node if the parent key is the same. This will have a huge benefit for our use case.

Taking clustered index solution into consideration, Join resource overhead might be somehow lower than we thought under cluster index. TiDB does not shuffle join data b/w TiKVs but gathers them all in TiDB. Also orders data are not stored across random nodes but gathered in single node as well if using clustered index. Because of these, pulling parent from remote and join or join inside TiKV consume similar resources (without further optimization, return joined data cost even more since its de-normalized and grows in size).

tirsen commented 4 years ago

First of all, 1PC together with interleaved index and shard resiliency are very sound reasons. I think we might have further discussion around these two and decide wether do it or not. BTW, are you interested in join us with in interleaved table development? If we decide to make it happen, your help will make it happen a lot soooner :)

Yes I think we should take that under very serious consideration.

For customer data spanning multiple regions: in clustered index solution (use custid as prefix to original orderid and orders will gather under individual custid like in interleaved table), we also clusters orders data in couple of regions instead of a lot many. It's a very similar solution to interleaved one except customer table is not stored together.

Yes clustered index will give us many of the benefits. Interleaved tables feels like the logical extension to this work and would take things even further in terms of availability and performance.

Taking clustered index solution into consideration, Join resource overhead might be somehow lower than we thought under cluster index. TiDB does not shuffle join data b/w TiKVs but gathers them all in TiDB. Also orders data are not stored across random nodes but gathered in single node as well if using clustered index. Because of these, pulling parent from remote and join or join inside TiKV consume similar resources (without further optimization, return joined data cost even more since its de-normalized and grows in size).

Right :-) I didn't mean to say that joins across tables would automatically be resolved locally in a tikv. I just said that they could theoretically be resolved in a single tikv node. Sorry I wasn't entirely clear there. I think interleaved tables would open up for many more optimizations. In fact I could imagine we could run most of tidb inside the coprocessor with interleaved tables. Food for thought. :-)

crazycs520 commented 3 years ago

I found that CockroachDB will deprecate interleaved tables in the future. https://www.cockroachlabs.com/docs/stable/interleave-in-parent.html#how-interleaved-tables-work

I found another defect of interleaved tables is hard to handle TRUNCATE TABLE DDL of the parent table since if the parent table is truncated, the child of interleaved tables will be useless. Or we should truncate those interleaved tables together to avoid this problem.