yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[YSQL] Extremely high planning times for simple queries #16313

Open isaac-mcfadyen opened 1 year ago

isaac-mcfadyen commented 1 year ago

Jira Link: DB-5741

Description

On a geo-distributed cluster (1 node each in Canada, US, and Europe), planning time for simple queries is extremely high (1600ms). This occurs even if a Tablespace is used to restrict a table to two regions.

Reproduction:

This issue only seems to occur on followers, not the leader, but is consistently reproducable.

ddorian commented 1 year ago

This occurs even if a Tablespace is used to restrict a table to a single region.

But you said you used with 2 regions in your text?

Can you paste the output of all your explain analyze statements? Both fast & slow?

What is the hardware of the yb-master & yb-tserver nodes?

Can you paste your cluster-config and all the queries you used (for everything including tablespaces)?

What version of yugabytedb are you using?

This issue only seems to occur on followers, not the leader, but is consistently reproducable.

Where is the client located? How far is it from the follower(s) and leader(s)?

Do note that "leader" is per-tablet, not "cluster wide" https://docs.yugabyte.com/preview/architecture/docdb-replication/replication/

isaac-mcfadyen commented 1 year ago

But you said you used with 2 regions in your text?

Yeah sorry, I meant with 2 regions there (edited and forgot to replace that part).

Can you paste the output of all your explain analyze statements? Both fast & slow?

Yup! Here's EXPLAIN ANALYZE for the leader which is fast, then the follower which is slow and then fast (presumably because the query plan is then cached?).

Leader:

yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=0.946..1.658 rows=1 loops=1)
 Planning Time: 7.846 ms
 Execution Time: 5.054 ms
 Peak Memory Usage: 40 kB
(4 rows)

Follower (slow):

yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=164.361..329.138 rows=1 loops=1)
 Planning Time: 1972.871 ms
 Execution Time: 411.164 ms
 Peak Memory Usage: 40 kB
(4 rows)

Follower (cached query plan, fast):

yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=82.301..246.353 rows=1 loops=1)
 Planning Time: 0.098 ms
 Execution Time: 246.500 ms
 Peak Memory Usage: 24 kB
(4 rows)

What is the hardware of the yb-master & yb-tserver nodes?

It's 3 nodes running on Azure, Standard_B1ms tier. I know that by default they have bad IOPS performance so I enabled Premium SSDs (didn't seem to make a difference). RPCs are over private IPs via VPC peering the three regions.

What version of yugabytedb are you using?

2.17.1.0 on AMD64

Where is the client located? How far is it from the follower(s) and leader(s)?

The client is ysqlsh on the nodes themselves (in the above outputs, ysqlsh on the leader and ysqlsh on the follower).

Can you paste your cluster-config and all the queries you used (for everything including tablespaces)?

Sure!

Command for master:

/opt/yugabyte-2.17.1.0/bin/yb-master \
          --master_addresses IP1:7100,IP2:7100,IP3:7100,IP4:7100 \
          --rpc_bind_addresses IP \
          --webserver_interface 0.0.0.0 \
          --fs_data_dirs /data/yugabyte \
          --use_private_ip region \
          --placement_cloud azure \
          --placement_region REGION \
          --placement_zone main \
          --leader_failure_max_missed_heartbeat_periods 10

Command for tserver:

/opt/yugabyte-2.17.1.0/bin/yb-tserver \
          --tserver_master_addrs IP1:7100,IP2:7100,IP3:7100 \
          --rpc_bind_addresses IP:9100 \
          --pgsql_proxy_bind_address localhost:5433 \
          --cql_proxy_bind_address localhost:9042 \
          --fs_data_dirs /data/yugabyte \
          --placement_cloud azure \
          --placement_region REGION \
          --placement_zone main \
          --leader_failure_max_missed_heartbeat_periods 10 \
          --ysql_log_statement all

3-region table (I dropped it beforehand, now recreating):

yugabyte=# CREATE TABLE test(id SERIAL PRIMARY KEY, test INTEGER);
CREATE TABLE
yugabyte=# INSERT INTO test (test) VALUES (25);
INSERT 0 1
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=115.678..343.748 rows=1 loops=1)
 Planning Time: 903.419 ms
 Execution Time: 343.852 ms
 Peak Memory Usage: 8 kB
(4 rows)

yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=114.103..195.956 rows=1 loops=1)
 Planning Time: 0.049 ms
 Execution Time: 196.014 ms
 Peak Memory Usage: 0 kB
(4 rows)

2-region table (dropped beforehand, now recreated) in Canada/US (excluding Europe which the 3-region table above was also in):

yugabyte=# CREATE TABLESPACE na_only WITH (replica_placement='{"num_replicas": 2, "placement_blocks": [{"cloud":"azure","region":"canadacentral","zone":"main","min_num_replicas":1}, {"cloud":"azure","region":"westus3","zone":"main","min_num_replicas":1}]}');
CREATE TABLESPACE
yugabyte=# CREATE TABLE test (id SERIAL PRIMARY KEY, test INTEGER);
CREATE TABLE
yugabyte=# INSERT INTO test (test) VALUES (25);
INSERT 0 1
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=82.093..196.983 rows=1 loops=1)
 Planning Time: 653.745 ms
 Execution Time: 197.373 ms
 Peak Memory Usage: 16 kB
(4 rows)

yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=82.587..197.133 rows=1 loops=1)
 Planning Time: 0.048 ms
 Execution Time: 197.193 ms
 Peak Memory Usage: 0 kB
(4 rows)

Thanks for the help: let me know if you need any more details!

ddorian commented 1 year ago

@isaac-mcfadyen

You didn't mention the tablespace on your create table query above? See https://docs.yugabyte.com/preview/api/ysql/the-sql-language/statements/ddl_create_table/#tablespace

By default, tables are placed in the pg_default tablespace, which spreads the tablets of the table evenly across the cluster.


Please read https://docs.yugabyte.com/preview/deploy/checklist/#replication:

The RF should be an odd number to ensure majority consensus can be established during failures.


Please read https://docs.yugabyte.com/preview/explore/ysql-language-features/going-beyond-sql/tablespaces/ and also set leader preference.

Or set_preferred_zones if no tablespace is used https://docs.yugabyte.com/preview/admin/yb-admin/#set-preferred-zones

isaac-mcfadyen commented 1 year ago

You didn't mention the tablespace on your create table query above?

Oh whoops! You're indeed correct, sorry about that 😅

Regardless though, if the default pg_default tablespace spreads tablets evenly, is that high of a planning time expected? I understand that network latency would play a big factor, but there's only 100-150ms delay between the furthest nodes (Europe and Canada) versus a 600-1900ms planning time reported by EXPLAIN ANALYZE


Please read docs.yugabyte.com/preview/deploy/checklist/#replication:

As for RF, the default is 3 and I have 3 nodes here, so I wouldn't expect to need to change it?

Or set_preferred_zones if no tablespace is used docs.yugabyte.com/preview/admin/yb-admin/#set-preferred-zones

I guess this links back to above: is the default configuration of spreading tablets evenly expected to cause this latency? I can understand wanting to set a preferred zone for a table but then the non-preferred zones would be higher latency, correct?

ddorian commented 1 year ago

As for RF, the default is 3 and I have 3 nodes here, so I wouldn't expect to need to change it?

The reason is so I can know exactly how many replicas there are on the tablespace and which is the preferred region if there is any. The third replica wasn't specified where it should be set in the tablespace. It's clearer to have it concrete.


Yes, I can reproduce with a multi-region cluster internally (EU(preferred zone),US,CA) using 2.17.3.0-b0.

ysqlsh running on EU server:

yugabyte=# \timing
Timing is on.
yugabyte=# CREATE TABLE test(id SERIAL PRIMARY KEY, test INTEGER);
CREATE TABLE
Time: 5037.370 ms (00:05.037)
yugabyte=# INSERT INTO test (test) VALUES (25);
INSERT 0 1
Time: 248.381 ms
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                            QUERY PLAN                          

--------------------------------------------------------------------------------
-------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=0.315..0.7
72 rows=1 loops=1)
 Planning Time: 4.500 ms
 Execution Time: 0.817 ms
 Peak Memory Usage: 8 kB
(4 rows)

Time: 5.708 ms
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                            QUERY PLAN                          

--------------------------------------------------------------------------------
-------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=0.425..0.9
09 rows=1 loops=1)
 Planning Time: 0.036 ms
 Execution Time: 0.948 ms
 Peak Memory Usage: 0 kB
(4 rows)

Time: 1.299 ms
yugabyte=# 

ysqlsh running on CA server:

yugabyte=# \timing
Timing is on.
yugabyte=# CREATE TABLE test(id SERIAL PRIMARY KEY, test INTEGER);
CREATE TABLE
Time: 13950.182 ms (00:13.950)
yugabyte=# INSERT INTO test (test) VALUES (25);
INSERT 0 1
Time: 1465.436 ms (00:01.465)
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=91.816..27
6.834 rows=1 loops=1)
 Planning Time: 732.498 ms
 Execution Time: 276.899 ms
 Peak Memory Usage: 8 kB
(4 rows)

Time: 1009.878 ms (00:01.010)
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=91.692..275.206 rows=1 loops=1)
 Planning Time: 0.031 ms
 Execution Time: 275.245 ms
 Peak Memory Usage: 0 kB
(4 rows)

Time: 275.608 ms
yugabyte=# 

CA, exiting the ysqlsh cli and entering again:

ysqlsh (11.2-YB-2.17.3.0-b0)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=92.025..27
5.794 rows=1 loops=1)
 Planning Time: 1095.668 ms
 Execution Time: 276.165 ms
 Peak Memory Usage: 16 kB
(4 rows)

yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=90.737..27
5.290 rows=1 loops=1)
 Planning Time: 0.042 ms
 Execution Time: 275.345 ms
 Peak Memory Usage: 0 kB
(4 rows)

Another try on CA again:

yugabyte=# explain analyze select * from test where id=25;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..4.11 rows=1 width=8) (actual time=92.191..92.192 rows=0 loops=1)
   Index Cond: (id = 25)
 Planning Time: 1277.416 ms
 Execution Time: 92.309 ms
 Peak Memory Usage: 16 kB
(5 rows)

Time: 3648.353 ms (00:03.648)
ddorian commented 1 year ago

This is being worked on https://github.com/yugabyte/yugabyte-db/issues/10821, which is committed but it's not yet in any release. See change after enabling ysql_enable_read_request_caching gflag

First session:


yugabyte=# \timing
Timing is on.
yugabyte=# explain analyze select * from test where id=25;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using test_pkey on test  (cost=0.00..4.11 rows=1 width=8) (actual time=180.609..180.609 rows=0 loops=1)
   Index Cond: (id = 25)
 Planning Time: 727.751 ms
 Execution Time: 180.715 ms
 Peak Memory Usage: 16 kB
(5 rows)

Time: 1544.445 ms (00:01.544)
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=91.217..271.813 rows=1 loops=1)
 Planning Time: 0.042 ms
 Execution Time: 271.860 ms
 Peak Memory Usage: 0 kB
(4 rows)

Time: 272.292 ms

Second session:


yugabyte=# \timing
Timing is on.
yugabyte=# EXPLAIN ANALYZE SELECT * FROM test;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..100.00 rows=1000 width=8) (actual time=90.587..275.054 rows=1 loops=1)
 Planning Time: 181.859 ms
 Execution Time: 275.415 ms
 Peak Memory Usage: 16 kB
(4 rows)

Time: 547.684 ms