yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.91k stars 1.06k forks source link

[YSQL] Identical SQL queries produce inconsistent results due to self-matching regex #22989

Closed ycybfhb closed 1 month ago

ycybfhb commented 3 months ago

Jira Link: DB-11908

Description

Basic Information

Version:

PostgreSQL 11.2-YB-2.21.1.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 17.0.6 (https://github.com/yugabyte/llvm-project.git 9b881774e40024e901fc6f3d313607b071c08631), 64-bit

Deploy:

version: '2'

volumes:
  yb-master-data-1:
  yb-tserver-data-1:
  yb-tserver-data-2:
  yb-tserver-data-3:
  yb-single-data-1:
  yb-tserver-single-data-1:

services:
  yb-master:
      image: yugabytedb/yugabyte:latest
      container_name: yb-master-n1
      volumes:
      - yb-master-data-1:/mnt/master
      command: [ "/home/yugabyte/bin/yb-master",
                "--fs_data_dirs=/mnt/master",
                "--master_addresses=yb-master-n1:7100",
                "--rpc_bind_addresses=yb-master-n1:7100",
                "--replication_factor=1"]
      ports:
      - "7000:7000"
      networks:
        yb-net:
          ipv4_address: 10.1.3.11
      environment:
        SERVICE_7000_NAME: yb-master

  yb-tserver:
      image: yugabytedb/yugabyte:latest
      container_name: yb-tserver-n1
      volumes:
      - yb-tserver-data-1:/mnt/tserver
      command: [ "/home/yugabyte/bin/yb-tserver",
                "--fs_data_dirs=/mnt/tserver",
                "--enable_ysql",
                "--rpc_bind_addresses=yb-tserver-n1:9100",
                "--tserver_master_addrs=yb-master-n1:7100"]
      environment:
        SERVICE_5433_NAME: ysql
        SERVICE_9042_NAME: ycql
        SERVICE_6379_NAME: yedis
        SERVICE_9000_NAME: yb-tserver
      depends_on:
      - yb-master
      networks:
        yb-net:
          ipv4_address: 10.1.3.21

  yb-tserver2:
      image: yugabytedb/yugabyte:latest
      container_name: yb-tserver-n2
      volumes:
      - yb-tserver-data-2:/mnt/tserver
      command: [ "/home/yugabyte/bin/yb-tserver",
                "--fs_data_dirs=/mnt/tserver",
                "--enable_ysql",
                "--rpc_bind_addresses=yb-tserver-n2:9100",
                "--tserver_master_addrs=yb-master-n1:7100"]
      environment:
        SERVICE_5433_NAME: ysql
        SERVICE_9042_NAME: ycql
        SERVICE_6379_NAME: yedis
        SERVICE_9000_NAME: yb-tserver
      depends_on:
      - yb-master
      networks:
        yb-net:
          ipv4_address: 10.1.3.22

  yb-tserver3:
      image: yugabytedb/yugabyte:latest
      container_name: yb-tserver-n3
      volumes:
      - yb-tserver-data-3:/mnt/tserver
      command: [ "/home/yugabyte/bin/yb-tserver",
                "--fs_data_dirs=/mnt/tserver",
                "--enable_ysql",
                "--rpc_bind_addresses=yb-tserver-n3:9100",
                "--tserver_master_addrs=yb-master-n1:7100"]
      environment:
        SERVICE_5433_NAME: ysql
        SERVICE_9042_NAME: ycql
        SERVICE_6379_NAME: yedis
        SERVICE_9000_NAME: yb-tserver
      depends_on:
      - yb-master
      networks:
        yb-net:
          ipv4_address: 10.1.3.23

  yb-single:
      image: yugabytedb/yugabyte:latest
      container_name: yb-single-n1
      volumes:
      - yb-single-data-1:/mnt/single
      command: [ "/home/yugabyte/bin/yb-master",
                "--fs_data_dirs=/mnt/single",
                "--master_addresses=yb-single-n1:7100",
                "--rpc_bind_addresses=yb-single-n1:7100",
                "--replication_factor=1"]
      ports:
      - "7001:7000"
      networks:
        yb-net:
          ipv4_address: 10.1.3.31
      environment:
        SERVICE_7000_NAME: yb-master

  yb-tserver-single:
      image: yugabytedb/yugabyte:latest
      container_name: yb-tserver-single-n1
      volumes:
      - yb-tserver-single-data-1:/mnt/tserver
      command: [ "/home/yugabyte/bin/yb-tserver",
                "--fs_data_dirs=/mnt/tserver",
                "--enable_ysql",
                "--rpc_bind_addresses=yb-tserver-single-n1:9100",
                "--tserver_master_addrs=yb-single-n1:7100"]
      environment:
        SERVICE_5433_NAME: ysql
        SERVICE_9042_NAME: ycql
        SERVICE_6379_NAME: yedis
        SERVICE_9000_NAME: yb-tserver
      depends_on:
      - yb-single
      networks:
        yb-net:
          ipv4_address: 10.1.3.41

networks:
  yb-net:
    driver: bridge
    enable_ipv6: false
    ipam:
      config:
        - subnet: 10.1.3.0/24
          gateway: 10.1.3.1

Reproduce

Firstly, connect to the Yugabyte databases for multi-node and single-node deployments respectively via the psql command.

Secondly, execute the statements in init.sql to create tables in both databases respectively.

Finally, execute the statements in error.sql in both databases, and find that the two databases return results with different numbers of rows.

The expression (cast(ref_0.c_fi369lv as char) ~* cast(ref_0.c_fi369lv as char)) will always return true if ref_0.c_fi369lv is not null. Thus, under correct condition, both databases should return the same results.

init.sql: init.sql.txt error.sql: error.sql.txt

Results

yb-tserver:

     c0      |     c6     
-------------+------------
  1813671397 | mdcs5dd
             | e
  -994485601 | fz7
  -385809542 | j99h6w
   907186018 | np
  1312891998 | duvax
 -2041657057 | z
 -1487878234 | eubi
   988157768 | rq6o
 -1963787237 | g5n_m8
         -15 | v
   -60152526 | k242tiv
          32 | h2fukr025i
   751073097 | ns
   -97299844 | gf
 -1778359483 | nu
 -1828605466 | q0ovjoovlu
        1000 | g19vcv
  -984524331 | b2
(19 rows)

yb-tserver-single:

     c0      |     c6     
-------------+------------
  1813671397 | mdcs5dd
             | e
  -994485601 | fz7
  -385809542 | j99h6w
   907186018 | np
   751073097 | ns
   -97299844 | gf
 -1778359483 | nu
 -1828605466 | q0ovjoovlu
        1000 | g19vcv
  -984524331 | b2
   -60152526 | k242tiv
          32 | h2fukr025i
  1584490560 | jnicrq
 -1487878234 | eubi
   988157768 | rq6o
 -1963787237 | g5n_m8
         -15 | v
  1312891998 | duvax
 -2041657057 | z
(20 rows)

About Us

We are the BASS team from the School of Cyber Science and Technology at Beihang University. Our main focus is on system software security, operating systems, and program analysis research, as well as the development of automated program testing frameworks for detecting software defects. Using our self-developed database vulnerability testing tool, we have identified the above-mentioned vulnerabilities in Yugabyte that may lead to database crashes.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

ddorian commented 3 months ago

Thank you @ycybfhb for reporting.

Just reproduced locally using 2024.1 version.

timothy-e commented 3 months ago

Simpler repro: (texticregexeq is the same as ~*, but easier to grep for in the codebase)

create table test_texticregex (t text);
insert into test_texticregex values ('a'), ('b'), (NULL), ('d');
explain (analyze, costs off, timing off, summary off) select * from test_texticregex where texticregexeq(t, t);
                      QUERY PLAN
------------------------------------------------------
 Seq Scan on test_texticregex (actual rows=2 loops=1)
   Storage Filter: texticregexeq(t, t)
(2 rows)

varies between 1, 2 or 3 rows.

There's no consistent row that gets excluded. Every permutation of a, b, d is possible.