[YSQL] 2x-to-2.5x slow down for 100k-row insert and create index in colocated vs regular database

Jira Link: DB-2301 The zip of my self-contained testcase is attached. I used a single node cluster on my MacBook (MacOS Mojave Version 10.14.6; Processor 2.6 GHz Intel Core i7).

Unzip and then start 0.sql at the ysqlsh prompt. For a maximally clean start, do this before running it.

yb-ctl destroy
yb-ctl create
ysqlsh -h localhost -p 5433 -d yugabyte -U yugabyte

Testcase outline

The testcase drops and re-creates the database using first this:

create database issue_4910 owner yugabyte;

and then this

create database issue_4910 with colocated = true owner yugabyte;

Each time after creating the database , it repeats the basic test five times to get a reasonable sample of the timings.

The basic test drops and recreates the test table, thus:

create table t(
  k uuid default gen_random_uuid() primary key,
  dp double precision not null,
  i int);

The surrogate PK column is populated using gen_random_uuid() rather than serial because timing tests (not included here) show this to be about 20x faster—as is expected.

Then this timing kernel is performed:

\timing on
insert into t(dp, i)
with v as (
  select normal_rand(:no_of_rows, 50, 100)::double precision as r)
select r, round(r) from v;
create unique index t_dp on t(dp);
create        index t_i  on t(i);
\timing off

I inherited my use of normal_rand() from the real context where I spotted this problem. It seems sensible to use this rather than the classic generate_series just in case index creation is favored by a dense sequence of monotonically increasing integer values.

Similarly, the two "value" columns with data type double precision and int come from my real context. I experimented by removing the int column. This made the slow-down ratios a bit smaller.

Results

I copied-and-pasted the recorded times (see below) into a spreadsheet to compute the averages and standard deviations. Here are the answers (times in seconds):

regular database:

        insert   dp cr. Index   int cr. Index
avg       2.81           2.71            1.79
stddev    0.17           0.08            0.12

colocated database:

        insert   dp cr. Index   int cr. Index
avg       5.32           6.92            4.78
stddev    1.46           0.88            0.57

timing ratios:

          1.89           2.56            2.67

I copied the timings from the terminal window into timings.txt, included in the zip.(Annoyingly, the timing output doesn't go to a file written with \o timings.txt.) Copied here for the reader's convenience:

            Version            
-------------------------------
 PostgreSQL 11.2-YB-2.1.8.2-b0

    Name    |  Owner   | With Colocation ? 
------------+----------+-------------------
 issue_4910 | yugabyte | 

Time: 3109.727 ms (00:03.110)
Time: 2845.602 ms (00:02.846)
Time: 1812.825 ms (00:01.813)

Time: 2719.342 ms (00:02.719)
Time: 2686.949 ms (00:02.687)
Time: 1678.752 ms (00:01.679)

Time: 2736.180 ms (00:02.736)
Time: 2672.009 ms (00:02.672)
Time: 1952.245 ms (00:01.952)

Time: 2751.094 ms (00:02.751)
Time: 2653.553 ms (00:02.654)
Time: 1708.433 ms (00:01.708)

Time: 2757.635 ms (00:02.758)
Time: 2669.942 ms (00:02.670)
Time: 1668.489 ms (00:01.668)

    Name    |  Owner   |   With Colocation ?   
------------+----------+-----------------------
 issue_4910 | yugabyte | with colocated = true

Time: 4151.433 ms (00:04.151)
Time: 7295.949 ms (00:07.296)
Time: 4297.778 ms (00:04.298)

Time: 7070.451 ms (00:07.070)
Time: 6125.373 ms (00:06.125)
Time: 5353.823 ms (00:05.354)

Time: 4099.537 ms (00:04.100)
Time: 7819.895 ms (00:07.820)
Time: 4286.298 ms (00:04.286)

Time: 6740.085 ms (00:06.740)
Time: 5833.184 ms (00:05.833)
Time: 5188.808 ms (00:05.189)

Time: 4556.224 ms (00:04.556)
Time: 7526.225 ms (00:07.526)
Time: 3548.293 ms (00:03.548)

Notice how the times stay fairly constant in the regular database but vary noticeably with repetition in the colocated case.

Please investigate these degrees of freedom and report the results here:

Vary the number of "value" columns from one through some reasonably large number like ten or twenty.
Single-node on laptop vs realistic 3-node deployment (e.g. on AWS or GCP).
Colocated database with regular table vs regular database with colocated table

Please note here, too, if you see the same variability in times that I show above in the colocated case after the first repeat.

Final note

On just one occasion, over several repetitions during the development of this testcase and then again in repeating the whole experiment several times, this bizarre error occurred once, during the creation of the unique index t_dp.

ysqlsh:do_timing_test.sql:14: ERROR:  Already present: Multiple homogeneous errors: [Already present (yb/consensus/retryable_requests.cc:239): Duplicate request]

@bllewell I was able to replicate the key issue locally. Sharing the (self-contained) SQL I used and results below.

Run tests

Set up the database

Enable colocation and create the pgcrypto extension for gen_random_uuid().

CREATE DATABASE test_colocated WITH colocated = true;
\c test_colocated;
CREATE EXTENSION pgcrypto;

Set up and load a hash-partitioned table

Using a single-tablet hash table to avoid variation due to default/starting number of tablets.

create table t_hash(
  k uuid default gen_random_uuid(),
  u uuid,
  i int,
  PRIMARY KEY (k HASH))
WITH (colocated = false)
SPLIT INTO 1 TABLETS;

\timing on

insert into t_hash(u, i) select gen_random_uuid(), generate_series from generate_series(1, 100000);

create unique index t_hash_u on t_hash(u HASH) SPLIT INTO 1 TABLETS;
create index t_hash_i  on t_hash(i HASH) SPLIT INTO 1 TABLETS;

Set up and load a range-partitioned table

Single-tablet is default starting point for range so nothing to configure (except exclude from colocation).

create table t_range(
  k uuid default gen_random_uuid(),
  u uuid,
  i int,
  PRIMARY KEY (k ASC))
WITH (colocated = false);

\timing on

insert into t_range(u, i) select gen_random_uuid(), generate_series from generate_series(1, 100000);

create unique index t_range_u on t_range(u ASC);
create index t_range_i  on t_range(i ASC);

Set up and load a colocated table

Single-tablet (concretely a section within the colocated tablet) and range-ordered is default for colocated.

create table t_colo(
  k uuid default gen_random_uuid(),
  u uuid,
  i int,
  PRIMARY KEY (k));

\timing on

insert into t_colo(u, i) select gen_random_uuid(), generate_series from generate_series(1, 100000);

create unique index t_colo_u on t_colo(u);
create index t_colo_i  on t_colo(i);

Results

Summary is below, all values are in milliseconds.

test type \ rel type	HASH	RANGE	COLOCATED
load table	`3260.215`	`2935.018`	`3037.336`
create+load uuid idx	`3428.487`	`3479.032`	`6418.028`
create+load int idx	`2024.351`	`2054.186`	`3812.083`

So overall, table load is more or less the same in all cases (a bit slower in hash case in this case but within regular variance during testing), while index creation (load) is very similar between hash and range, but almost ~2x slower for colocated.

yugabyte / yugabyte-db