ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
4.01k stars 568 forks source link

YDB poor performance on (Primary Key) ORDER BY DESC query #517

Open eugene-vodyanko opened 11 months ago

eugene-vodyanko commented 11 months ago

Hi there.

First of all, thank you very much for YDB. I take off my hat.

Essence of the problem

In my test, I have the table with messages (20M rows) in YDB:

CREATE TABLE messages_lz4 (
  `id` UInt64,
  `creation_date` DateTime,
  `state_id` UInt8,
  `state_time` DateTime,

  PRIMARY KEY (sender_id, creation_date, `_source`, id),
  FAMILY default (
        COMPRESSION = "lz4"
    )
) 

I want to request the history of recent messages for a specific sender (or senders):

select id, creation_date, sender_id, state_id, length(content) content_len from messages_lz4
where sender_id = 10
    and creation_date between DateTime("2020-11-27T00:00:00Z") and DateTime("2022-11-27T00:00:00Z")
order by sender_id desc, creation_date desc
limit 10;

The request takes an average of 850 milliseconds to complete: img_2

The fragment from Stat tab in UI:

Tables:[] 1 item
0:{} 4 items
AffectedPartitions:1
ReadBytes:224985602
ReadRows:1665572
TablePath:/local/messages_lz4
DurationUs:850939
WorkerCpuTimeUs:3785

The query plan contains an explicit sorting step: img_1

If we change the sorting order from DESC to ASC:

select id, creation_date, sender_id, state_id, length(content) content_len from messages_lz4
where sender_id = 10
and creation_date between DateTime("2020-11-27T00:00:00Z") and DateTime("2022-11-27T00:00:00Z")
order by sender_id, creation_date
limit 10;

We will get a much more interesting result (~25ms):

CpuTimeUs:846
Tables:[] 1 item
0:{} 3 items
ReadBytes:525082
ReadRows:4215
TablePath:/local/messages_lz4
DurationUs:22038
WorkerCpuTimeUs:3514

img_3

And the plan without a sorting step: img_4

However, if we want to get data on the list of senders sorted by date, we will again get lower performance than we could have expected: img_5

Expected behavior

Efficient sampling without physical sorting and without reading unnecessary data.

In addition, in this DBs you can explicitly set the order in the index (PK), of the form (key_1 ASC, key_2 DESC). If I understood the YDB documentation correctly, this is not supported for either PK or secondary indexes.

Maybe, you have recommendations or instructions?

Other DBMS behavior: ClickHouse

I understand that this is an antipattern for ClickHouse... but.

ORDER BY ASC (~10ms)

select  id, creation_date, sender_id, state_id, length(content) content_len from messages
where sender_id = 10
  and creation_date between '2020-11-27 00:00:00' and '2022-11-27T00:00:00'
order by sender_id, creation_date 
limit 10

img_8

ORDER BY DESC (~10ms)

select  id, creation_date, sender_id, state_id, length(content) content_len from messages
where sender_id = 10
  and creation_date between '2020-11-27 00:00:00' and '2022-11-27T00:00:00'
order by sender_id desc, creation_date desc
limit 10

img_9

And a more correct comparison with the keyword FINAL (33ms vs. 850ms):

select  id, creation_date, sender_id, state_id, length(content) content_len from messages FINAL
where sender_id in (2, 10)
  and creation_date between '2020-11-27 00:00:00' and '2022-11-27T00:00:00'
order by creation_date desc
limit 10

img_10

Test Environment

  1. YDB v23.3 1-node Docker installation on Windows (AMD Ryzen 7 7700X 8-Core Processor 4.50 GHz, m2 SSD)
  2. ClickHouse v23.8 Baremetal Debian (Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz, HDD).

P.S. Later, if you are interested, I can give similar calculations for Oracle and (maybe) Cassandra/ScyllaDB. P.P.S As workaround, I tried using a numeric column approach instead of a date that stores an "inverted" representation in the style of toUnixTimestamp('2050-01-01 00:00:00') - toUnixTimestamp(creation_date). However, in this case, expressions become more complicated and filtering becomes less optimal DateTime::FromSeconds(CAST(2524582800 - delta as Uint32)) between Date("2020-11-27") and Date("2022-11-27"). It just looks terrible. I am sure that my usage scenario is very common and there is a way to do it effectively.

eugene-vodyanko commented 11 months ago

UPD.

Other DBMS behavior: Oracle Database

Table spec:

create table MESSAGES
(
  id              NUMBER not null,
  creation_date   DATE not null,
  state           NUMBER(1),
  state_time      TIMESTAMP(6),
  ...
, CONSTRAINT MESSAGES_PK PRIMARY KEY (sender, creation_date, id)
)
ORGANIZATION INDEX;

Query with ORDER BY DESC

select * from
(
select /*+ FIRST_ROWS(1)*/ id, creation_date, sender, state, length(content) from messages
where sender = 10
and creation_date between to_Date('2020-11-27', 'yyyy-mm-dd') and to_Date('2022-11-27', 'yyyy-mm-dd')
order by creation_date desc
)
where rownum <= 10

Plan have no sorting

------------------------------------------------------
| Id  | Operation                       | Name        
------------------------------------------------------
|   0 | SELECT STATEMENT                |             
| * 1 |   COUNT STOPKEY                 |             
|   2 |    VIEW                         |             
| * 3 |     INDEX RANGE SCAN DESCENDING | MESSAGES_PK 

img_15

Result ~ 80 ms. Env. Solaris/SPARC, HDD.

Disk space usage:

eugene-vodyanko commented 11 months ago

UPD. 2.

Other DBMS behavior: PostgreSQL

In general, Postgres metrics are similar to Oracles when using Index Only Scan queries, however:

Calculations with plans and so on staff omitted.

eugene-vodyanko commented 9 months ago

UPD. 3.

Other DBMS behavior: Apache Cassandra

As expected, Cassandra shows itself quite favorably on the same load profile (~30-40 ms). Moreover, both for direct and reverse sorting order.

Table spec.:

CREATE TABLE IF NOT EXISTS isdb.messages (
     id int,
     creation_date timestamp,
...
     "_source" text,
     "_ver" int,

     PRIMARY KEY ((sender_id), creation_date , id, "_source")
)
    WITH CLUSTERING ORDER BY (creation_date DESC, id DESC, "_source" ASC)
;

ORDER BY DESC

[2024-02-17 00:03:04] Connected
> select id, creation_date, sender_id, state_id from isdb.messages
  where sender_id = 10
    and creation_date >= '54874-05-09 04:06:40.000' and creation_date <= '54874-05-11 16:23:20.000'
  order by creation_date desc, id desc
  limit 10
[2024-02-17 00:03:04] 10 rows retrieved starting from 1 in 26 ms (execution: 5 ms, fetching: 21 ms)

img_6

ORDER BY ASC

> select id, creation_date, sender_id, state_id from isdb.messages
  where sender_id = 10
    and creation_date >= '54874-05-09 04:06:40.000' and creation_date <= '54874-05-11 16:23:20.000'
  order by creation_date asc, id asc
  limit 10
[2024-02-17 00:06:29] 10 rows retrieved starting from 1 in 27 ms (execution: 4 ms, fetching: 23 ms)

img_7

Сompression

Cassandra also provides efficient compression up to 4 times in relation to the "raw" data.

P.S. The tests were performed on the same data set, but when uploading data to Cassandra, the timestamp columns were loaded incorrectly due to timestamp conversion problems. This should not be confusing, it does not affect performance.