eugene-vodyanko commented 11 months ago

Hi there.

First of all, thank you very much for YDB. I take off my hat.

Essence of the problem

In my test, I have the table with messages (20M rows) in YDB:

CREATE TABLE messages_lz4 (
  `id` UInt64,
  `creation_date` DateTime,
  `state_id` UInt8,
  `state_time` DateTime,

  PRIMARY KEY (sender_id, creation_date, `_source`, id),
  FAMILY default (
        COMPRESSION = "lz4"
    )
)

I want to request the history of recent messages for a specific sender (or senders):

select id, creation_date, sender_id, state_id, length(content) content_len from messages_lz4
where sender_id = 10
    and creation_date between DateTime("2020-11-27T00:00:00Z") and DateTime("2022-11-27T00:00:00Z")
order by sender_id desc, creation_date desc
limit 10;

The request takes an average of 850 milliseconds to complete:

The fragment from Stat tab in UI:

Tables:[] 1 item
0:{} 4 items
AffectedPartitions:1
ReadBytes:224985602
ReadRows:1665572
TablePath:/local/messages_lz4
DurationUs:850939
WorkerCpuTimeUs:3785

The query plan contains an explicit sorting step:

If we change the sorting order from DESC to ASC:

select id, creation_date, sender_id, state_id, length(content) content_len from messages_lz4
where sender_id = 10
and creation_date between DateTime("2020-11-27T00:00:00Z") and DateTime("2022-11-27T00:00:00Z")
order by sender_id, creation_date
limit 10;

We will get a much more interesting result (~25ms):

CpuTimeUs:846
Tables:[] 1 item
0:{} 3 items
ReadBytes:525082
ReadRows:4215
TablePath:/local/messages_lz4
DurationUs:22038
WorkerCpuTimeUs:3514

And the plan without a sorting step:

However, if we want to get data on the list of senders sorted by date, we will again get lower performance than we could have expected:

Expected behavior

Efficient sampling without physical sorting and without reading unnecessary data.

If you compare with other DBMS, for example, Oracle has an INDEX RANGE SCAN DESCENDING scan mode for Index Organized Tables.
Also, as far as I understand, there is a similar mode in Cassandra -- the index is read simply in reverse order.

In addition, in this DBs you can explicitly set the order in the index (PK), of the form (key_1 ASC, key_2 DESC). If I understood the YDB documentation correctly, this is not supported for either PK or secondary indexes.

Maybe, you have recommendations or instructions?

Other DBMS behavior: ClickHouse

I understand that this is an antipattern for ClickHouse... but.

ORDER BY ASC (~10ms)

select  id, creation_date, sender_id, state_id, length(content) content_len from messages
where sender_id = 10
  and creation_date between '2020-11-27 00:00:00' and '2022-11-27T00:00:00'
order by sender_id, creation_date 
limit 10

ORDER BY DESC (~10ms)

select  id, creation_date, sender_id, state_id, length(content) content_len from messages
where sender_id = 10
  and creation_date between '2020-11-27 00:00:00' and '2022-11-27T00:00:00'
order by sender_id desc, creation_date desc
limit 10

And a more correct comparison with the keyword FINAL (33ms vs. 850ms):

select  id, creation_date, sender_id, state_id, length(content) content_len from messages FINAL
where sender_id in (2, 10)
  and creation_date between '2020-11-27 00:00:00' and '2022-11-27T00:00:00'
order by creation_date desc
limit 10

Test Environment

YDB v23.3 1-node Docker installation on Windows (AMD Ryzen 7 7700X 8-Core Processor 4.50 GHz, m2 SSD)
ClickHouse v23.8 Baremetal Debian (Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz, HDD).

P.S. Later, if you are interested, I can give similar calculations for Oracle and (maybe) Cassandra/ScyllaDB. P.P.S As workaround, I tried using a numeric column approach instead of a date that stores an "inverted" representation in the style of toUnixTimestamp('2050-01-01 00:00:00') - toUnixTimestamp(creation_date). However, in this case, expressions become more complicated and filtering becomes less optimal DateTime::FromSeconds(CAST(2524582800 - delta as Uint32)) between Date("2020-11-27") and Date("2022-11-27"). It just looks terrible. I am sure that my usage scenario is very common and there is a way to do it effectively.

eugene-vodyanko commented 11 months ago

UPD.

Other DBMS behavior: Oracle Database

Table spec:

create table MESSAGES
(
  id              NUMBER not null,
  creation_date   DATE not null,
  state           NUMBER(1),
  state_time      TIMESTAMP(6),
  ...
, CONSTRAINT MESSAGES_PK PRIMARY KEY (sender, creation_date, id)
)
ORGANIZATION INDEX;

Query with ORDER BY DESC

select * from
(
select /*+ FIRST_ROWS(1)*/ id, creation_date, sender, state, length(content) from messages
where sender = 10
and creation_date between to_Date('2020-11-27', 'yyyy-mm-dd') and to_Date('2022-11-27', 'yyyy-mm-dd')
order by creation_date desc
)
where rownum <= 10

Plan have no sorting

------------------------------------------------------
| Id  | Operation                       | Name        
------------------------------------------------------
|   0 | SELECT STATEMENT                |             
| * 1 |   COUNT STOPKEY                 |             
|   2 |    VIEW                         |             
| * 3 |     INDEX RANGE SCAN DESCENDING | MESSAGES_PK

Result ~ 80 ms. Env. Solaris/SPARC, HDD.

Disk space usage:

Oracle w/out compression: ~ 6,9 Gb.
YDB with LZ4 compression: ~ 2,1 Gb.

eugene-vodyanko commented 11 months ago

UPD. 2.

Other DBMS behavior: PostgreSQL

In general, Postgres metrics are similar to Oracles when using Index Only Scan queries, however:

this is achieved either by excessive data storage (both in the table and in the idex) and using covering indexes with INCLUDE or
by "repacking" the records in the table in the order of a more compact index (without INCLUDE) using the CLUSTER command.

Calculations with plans and so on staff omitted.

eugene-vodyanko commented 9 months ago

UPD. 3.

Other DBMS behavior: Apache Cassandra

As expected, Cassandra shows itself quite favorably on the same load profile (~30-40 ms). Moreover, both for direct and reverse sorting order.

Table spec.:

CREATE TABLE IF NOT EXISTS isdb.messages (
     id int,
     creation_date timestamp,
...
     "_source" text,
     "_ver" int,

     PRIMARY KEY ((sender_id), creation_date , id, "_source")
)
    WITH CLUSTERING ORDER BY (creation_date DESC, id DESC, "_source" ASC)
;

ORDER BY DESC

[2024-02-17 00:03:04] Connected
> select id, creation_date, sender_id, state_id from isdb.messages
  where sender_id = 10
    and creation_date >= '54874-05-09 04:06:40.000' and creation_date <= '54874-05-11 16:23:20.000'
  order by creation_date desc, id desc
  limit 10
[2024-02-17 00:03:04] 10 rows retrieved starting from 1 in 26 ms (execution: 5 ms, fetching: 21 ms)

ORDER BY ASC

> select id, creation_date, sender_id, state_id from isdb.messages
  where sender_id = 10
    and creation_date >= '54874-05-09 04:06:40.000' and creation_date <= '54874-05-11 16:23:20.000'
  order by creation_date asc, id asc
  limit 10
[2024-02-17 00:06:29] 10 rows retrieved starting from 1 in 27 ms (execution: 4 ms, fetching: 23 ms)

Сompression

Cassandra also provides efficient compression up to 4 times in relation to the "raw" data.

P.S. The tests were performed on the same data set, but when uploading data to Cassandra, the timestamp columns were loaded incorrectly due to timestamp conversion problems. This should not be confusing, it does not affect performance.

ydb-platform / ydb

YDB poor performance on (Primary Key) ORDER BY DESC query #517

Essence of the problem

Expected behavior

Other DBMS behavior: ClickHouse

Test Environment

UPD.

Other DBMS behavior: Oracle Database

UPD. 2.

Other DBMS behavior: PostgreSQL

UPD. 3.

Other DBMS behavior: Apache Cassandra

Table spec.:

ORDER BY DESC

ORDER BY ASC

Сompression