PostgreSQL 如何估算 LIKE 的 return rows 數量

_使用 LIKE 前，先看看 like_support.c_

7e8a47dfc4d554b6f15d329bcbb69029_XL

序

接續在前篇文章中提過底層 scan node 的 row estimation 結果以及上層其他 plan node 操作會綜合影響資料庫最後使用哪一個 plan，所以 scan node 的 row estimation 相當重要。

PostgreSQL 如何估算 HashAggregate 的 Return Rows ，以及低估的後果其中一個很重要的步驟就是預估 Return Rows 的數量，它被用來：

給底層的 Scan Node 比較各種 Access Method (Seq Scan, Index Scan, Bitmap Index Scan + Bitmap Heap Scan, …) 的 Cost

往往更重要的是 Return Rows 的估算對於上層其他 Plan Node 的 Cost 有巨大影響，例如 Nested Loop 或 Hash Join，因為預估它們就是會從底下的 Scan Node 拿出這麼多的 rows 數量來處理

這次我們遇到的案例則是 LIKE 的 row estimation 與實際差很多，想要一探究竟

案例

我們有一張表經常用 LIKE 進行 prefix pattern matching 查詢，表的結構如下：

CREATE TABLE IF NOT EXISTS counters (
  id TEXT PRIMARY KEY COLLATE "C",
  value BIGINT
);

注意我們在 id 上面設定 COLLATE "C" 是為了要讓 PostgreSQL 能直接使用 PRIMARY KEY 的 B-tree 索引就能縮小 prefix pattern matching 的範圍，否則得額外建立一個 text_pattern_ops 的 B-tree 索引，參考: 11.10. Operator Classes and Operator Families

實際的 Query Plan：

postgres=# explain analyze select * from counters where id like 'ms:149895:%';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=488 width=36) (actual time=0.014..0.022 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.148 ms
 Execution Time: 0.034 ms
(5 rows)

可以看到原本預估回傳的是 488 個 rows，但是實際上只有 10 個，雖然這次我們沒有把這個結果再拿去做其他處理，資料庫他依然如預期中將 prefix 改成了 range query 去用了 B-tree index scan，但又是為什麼會高估了快 48 倍呢？

讀原始碼

我們可以從 pg_operator 表中知道各個 operator 對應的 selectivity function，例如：

postgres=# select oid, oprname, oprcode, oprrest from pg_operator where oprname = '~~';
 oid  | oprname |  oprcode   | oprrest
------+---------+------------+---------
 1207 | ~~      | namelike   | likesel
 1209 | ~~      | textlike   | likesel
 1211 | ~~      | bpcharlike | likesel
 2016 | ~~      | bytealike  | likesel
(4 rows)

根據 oprrest 我們可以知道 LIKE (~~) 所使用的 selectivity function 就是 likesel，我們可以進一步在 pg_proc 裡面找到它的訊息：

postgres=# select oid, proname, prosrc from pg_proc where oid = (select distinct oprrest from pg_operator where oprname = '~~');
oid  | proname | prosrc
------+---------+---------
 1819 | likesel | likesel
(1 row)

根據 prosrc 剛好它的 c function 名稱就是 likesel

其實 pg_operator 與 pg_proc 這兩張表的初始資料在 source code 中的 src/include/catalog/ 也找得到，詳細：Chapter 69. System Catalog Declarations and Initial Contents

上次提到 PostgreSQL 大部分 selectivity function 都可以在 selfuncs.c 裡面找到，不過跟 like 相關的是其中一個例外，它們在 2019.2.14 https://github.com/rueian/postgres/commit/49fa99e54ec0ded180b52a4a14e543702d53e8c9 被重構到了 like_support.c 裡面了

這次我們要找的就是其中的 likesel() 中使用到的 patternsel() 中的 patternsel_common()

patternsel_common() 主要基於欄位的 histogram 統計資料，具體流程：

先檢查我們的 pattern 是不是 exact match ，若是則轉用 = 的 selectivity function https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L620-L627
若我們超過(含) 100 個 histogram buckets，我們則直接將 pattern 拿去匹配去掉頭尾之後的每個 histogram boundary 當作 selectivity https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L657-L664 https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/selfuncs.c#L778-L829
若 histogram buckets 不足 100，那我們將 pattern 拆成 prefix 與非 prefix 兩部份分開計算 selectivity 再相乘合併
- prefix selectivity 計算方式如同 range selectivity，估出 prefix 與 greater prefix 分別(例如 ms:149895: 與 ms:149895;) 在兩個 histogram bucket 的大概位置然後再用兩個位置夾住的範圍占比當作 selectivity https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L661-L669 https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L1189-L1287
- 非 prefix 部分的 selectivity 則是用一些 magic number 猜測，例如多一個 % 的話 selectivity 就 5，多個 char selectivity 就 0.2 https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L594-L605 非 prefix 的部分是在 pattern_fixed_prefix() 時就取出儲存在 rest_selec，實際計算是在 like_selectivity() https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L1290-L1341
若發現前面算出來的 selectivity 太大或太小，都捨棄，改成 0.0001 或 0.9999 https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L689-L693
最後再加上 mcv 統計與 null fraction 修正 https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L695-L711

流程中可以看到 LIKE 的 selectivity 的計算主要基於欄位的 histogram 統計資料，再加上一些 mcv 與一些 magic number。

關於 histogram 與 mcv 可以參考 70.1. Row Estimation Examples 它們也是主要用來做 >, <, = 的統計資訊。

那我們的案例中預估 488 rows 是怎麼來的呢？

首先關於 histogram buckets 數量，這個是可以透過 ALTER TABLE SET STATISTICS 為個別欄位設定，而我們是使用預設的 default_statistics_target=100 14.2. Statistics Used by the Planner

而我們的 histogram 確實也有 100 個 buckets (101 個 boundary)

postgres=# SELECT array_length(histogram_bounds, 1) FROM pg_stats WHERE tablename='counters' AND attname='id';
 array_length
--------------
          101
(1 row)

也就是我們的案例是直接用 pattern 去匹配去掉頭尾的 101-2 = 99 個 boundary 當作 selectivity

而實際上我們的 boundary 大約是長這樣

postgres=# SELECT unnest(histogram_bounds::text::text[]) FROM pg_stats WHERE tablename='counters' AND attname='id';
                            unnest
--------------------------------------------------------------
 ms:103734:p
 ms:1074631:cl
 ms:1103301:c
 ms:1137423:pn
 ms:1168158:pr
 ms:1192862:p
 ms:1221019:cl
 ms:1243628:c
 ...
(101 rows)

我們的案例拿 ms:149895:% 去用這個 histogram_bounds 算 selectivity 出來會是 0，然後經過上述流程第四步後會被改成 0.0001

驗證

為了驗證，我們看一下整張表的預估是多少 rows

postgres=# EXPLAIN SELECT * FROM counters;
                            QUERY PLAN
-------------------------------------------------------------------
 Seq Scan on counters  (cost=0.00..89169.11 rows=4880311 width=36)
(1 row)

是 4880311，LIKE 的估算確實是它的萬分之一

若我們將 0.0001 改為 0.00001，然後重新編譯後再跑一次 EXPLAIN

                 /* In any case, don't believe extremely small or large estimates. */
        if (selec < 0.00001)
            selec = 0.00001;
        else if (selec > 0.9999)
            selec = 0.9999;

postgres=# explain analyze select * from counters where id like 'ms:149895:%';
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=49 width=36) (actual time=0.015..0.023 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.146 ms
 Execution Time: 0.036 ms
(5 rows)

可以看到確實變成 49 了

解決方案

雖然在我們這次的案例中 row estimations 對於 scan table 的方式沒有影響，畢竟有適合的 index 可以用，就算是估萬分之一也不至於會到使用 full table scan，除非 random_page_cost 真的設到超級高

但若我們有再把這個結果集再拿去查詢其他 table 的話，這個萬分之一的估計可能就會導致掃描其他 table 的方式改變了。

在一開始的 EXPLAIN 中可以看到 PostgreSQL 幫我們 Query 改寫成 range query + like filter

postgres=# explain analyze select * from counters where id like 'ms:149895:%';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=488 width=36) (actual time=0.014..0.022 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.148 ms
 Execution Time: 0.034 ms
(5 rows)

不過實際上目前 PostgreSQL 是在估算完 return rows 之後才進行改寫，若我們真要讓他估算更準確的話，可以預先就寫成 range query + like filter:

postgres=# explain analyze select * from counters where id >= 'ms:149895:' and id <= 'ms:149895;' and id like 'ms:149895:%';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using counters_pkey on counters  (cost=0.56..8.58 rows=1 width=36) (actual time=0.023..0.039 rows=10 loops=1)
   Index Cond: ((id >= 'ms:149895:'::text) AND (id <= 'ms:149895;'::text) AND (id >= 'ms:149895:'::text) AND (id < 'ms:149895;'::text))
   Filter: (id ~~ 'ms:149895:%'::text)
 Planning Time: 0.156 ms
 Execution Time: 0.036 ms
(5 rows)

總結

LIKE 的估算最少會是整張表萬分之一 或 先估算再改寫 Query 只是目前 PostgreSQL 的實作細節，或許哪一天就改掉了。

不過目前來說若萬分之一的估算是嚴重高估，又直接把 LIKE 的結果再拿去查其他表的話，是有可能會因此挑選到比較差的 Query Plan，最好還是先 EXPLAIN 看一下是否有需要改寫 Query。

連結

前篇：PostgreSQL 如何估算 HashAggregate 的 Return Rows ，以及低估的後果

關於 histogram 與 mcv 統計在 PostgreSQL 手冊中的第 70.1 章有詳細說明 70.1. Row Estimation Examples

關於 default_statistics_target 除了在 PostgreSQL 手冊中的第 14.2 章有詳細說明 14.2. Statistics Used by the Planner 之外 cybertec 也有一篇 changing-histogram-sizes

另外這次我們看到的 LIKE 的 prefix pattern matching 被改寫成 range query + like filter 其實已經被重構到了 PostgreSQL 12 的新功能 SupportRequestIndexCondition 之下 https://github.com/rueian/postgres/blob/5406513e997f5ee9de79d4076ae91c04af0c52f6/src/backend/utils/adt/like_support.c#L183-L224 在 PostgreSQL 12 我們在創建 function 時可以透過 CREATE FUNCTION ... SUPPORT ... 來設定支援 SupportRequestIndexCondition, SupportRequestSelectivity ... 等的 c function 來客製化 planner 的行為。參考: 37.11. Function Optimization Information 與 CREATE FUNCTION

rueian / postgres

PostgreSQL 如何估算 LIKE 的 return rows 數量 #2

序

案例

讀原始碼

驗證

解決方案

總結

連結