readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
https://readyset.io
Other
4.54k stars 125 forks source link

Snapshot employees database - wrong number of rows #1358

Closed altmannmarcelo closed 2 months ago

altmannmarcelo commented 3 months ago

Description

Investigate snapshot employees database:

mysql> SELECT * FROM `employees`.`titles` ORDER BY `employees`.`titles`.`emp_no` ASC, `employees`.`titles`.`title` ASC, `employees`.`titles`.`from_date` ASC LIMIT 100001;

|  77557 | Staff              | 1994-12-12 | 1996-09-01 | <-- lower bound
|  77558 | Engineer           | 1987-08-09 | 1996-08-08 |
|  77558 | Senior Engineer    | 1996-08-08 | 2099-12-31 |
|  77559 | Engineer           | 1993-08-30 | 1994-04-28 |
|  77560 | Assistant Engineer | 1993-11-27 | 1999-07-08 |
|  77561 | Engineer           | 1996-06-29 | 2000-04-21 |
|  77562 | Senior Engineer    | 1985-12-14 | 2099-12-31 |
|  77563 | Staff              | 1996-01-04 | 2099-12-31 |
|  77564 | Staff              | 1998-02-11 | 1999-11-27 |
|  77565 | Senior Staff       | 1992-03-14 | 2099-12-31 |
|  77566 | Staff              | 1986-10-26 | 1992-10-27 |
+--------+--------------------+------------+------------+
100010 rows in set (0,050 sec)

mysql> SELECT * FROM `employees`.`titles` WHERE `employees`.`titles`.`emp_no` >= 77557 AND `employees`.`titles`.`title` >= 'Staff' AND `employees`.`titles`.`from_date` >= '1994-12-12 00:00:00' AND (`employees`.`titles`.`emp_no`, `employees`.`titles`.`title`, `employees`.`titles`.`from_date`) != (77557, 'Staff', '1994-12-12 00:00:00') ORDER BY `employees`.`titles`.`emp_no` ASC, `employees`.`titles`.`title` ASC, `employees`.`titles`.`from_date` ASC LIMIT 1;
+--------+-------+------------+------------+
| emp_no | title | from_date  | to_date    |
+--------+-------+------------+------------+
|  77563 | Staff | 1996-01-04 | 2099-12-31 |
+--------+-------+------------+------------+
1 row in set (0,010 sec)

Our query is not right the first field in the PK has a high cardinality and we are ignoring everything that rang lower than Staff.

Change in user-visible behavior

Requires documentation change