Support running GeoPartitioned TPCC

deeps1991 commented 3 years ago

This patch provides support to optionally run TPCC in a geopartitioned setup by partitioning all the TPCC tables by warehouse-id. This will allow placement of data such that all the rows in all the tables associated with a particular range of warehouses will always be located in a particular zone.

Specifying the number of partitions, and the placement configuration for each partition can be provided in XML format in geopartitioned_workload.xml

GeoPartitioned TPCC is different from the regular TPCC in the following ways: 1) The tables needed for TPCC are all created in a partitioned manner. We use YSQL Declarative Partitioning to split the total number of warehouses into a set of ranges, and assign these ranges to partitions. Each TPCC table is split by warehouse ID in the same way, thus all the data pertaining to a warehouse can be found in the same zone.

2) A Foreign key reference from tableX->tableY is implemented as tableX_part1-->tableY_part1, tableX_part2-->tableY_part2...

3) SQL functions do not support partition pruning, unlike PL/Pgsql Procedures. However SQL functions support batching of writes. Hence to have the best of both worlds, the updatestock functions now calculates the appropriate partition of the table to update and updates the appropriate partition table (If we had used Pl/Pgsql procedures, we could have had the updatestock function update the partitioned table directly, and the planner would have determined the right partition to update. However this would have removed the ability to batch updates).

4) 15% of Payment transactions and 1% of NewOrder transactions operate on a "remote warehouse". Ordinarily this remote warehouse would be a random number picked across all warehouses. However, if geopartitioning is enabled, the "remote warehouse" is a random number picked within the same partition as the "local" warehouse. This is a temporary restriction until YSQL can support Foreign Keys on partitioned tables.

deeps1991 commented 3 years ago

Hey @robertsami Thanks for such a detailed review! I addressed all your comments in this patch, please let me know if you have more.

deeps1991 commented 3 years ago

Thanks for the detailed review @d-uspenskiy! Addressed your comments in the latest patch.

deeps1991 commented 3 years ago

Thanks @d-uspenskiy , addressed your latest code review comments

deeps1991 commented 3 years ago

Addressed your comment, @d-uspenskiy, thanks for the review!

deeps1991 commented 3 years ago

@hbhanawat I made a change to remove the dependency between number of clients and number of partitions. I made the validation check less restrictive - it no longer checks that the client is exactly matching partition boundaries, rather it only checks whether the client is operating on some range of warehouses that are completely within one partition (this is for the FK checks to work). This way you could split into say 3 partitions, but have like 30 clients connect, just as long as each client operates on a range of warehouses that lies completely within one partition. Let me know if this addresses all your concerns.

hbhanawat commented 3 years ago

Have reviewed the last commit that removes the dependency between number of clients and number of partitions. That looks good.

yugabyte / tpcc

Support running GeoPartitioned TPCC #114