readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
https://readyset.io
Other
4.54k stars 125 forks source link

Readyset returns wrong results if restart happens #1275

Open altmannmarcelo opened 5 months ago

altmannmarcelo commented 5 months ago

Description

Sport this while testing perf of hash joins for straddle joins. If we have a cache and then restart readyset, when the migration is happening and we keep running the same query, it returns wrong results.

for f in $(seq 1 20000)
do
  mysql -e "INSERT INTO tb1 VALUES (${f}, '10', now())" test
  mysql -e "INSERT INTO tb2 VALUES (10, ${f}, '${f}', now())" test
  mysql -e "INSERT INTO tb2 VALUES (10, ${f}, '${f}', now())" test
done

Readyset (requires you to start with --enable-experimental-straddled-joins):

CREATE CACHE FROM SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;
SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;

mysql> SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;
+--------------------------+
| count(`test`.`tb1`.`ID`) |
+--------------------------+
|                    20000 |
+--------------------------+
1 row in set (0,05 sec)

Restart RS and execute the query multiple times:

Note: Result with column name COUNT(tb1.ID) are from upstream, results with `count(`test`.`tb1`.`ID`)` are from readyset

mysql> SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    8
Current database: *** NONE ***

+---------------+
| COUNT(tb1.ID) |
+---------------+
|         20000 |
+---------------+
1 row in set (0,03 sec)

mysql>
mysql>
mysql> SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;
+---------------+
| COUNT(tb1.ID) |
+---------------+
|         20000 |
+---------------+
1 row in set (0,02 sec)

mysql> SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;
+---------------+
| COUNT(tb1.ID) |
+---------------+
|         20000 |
+---------------+
1 row in set (0,02 sec)

mysql> SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;
+---------------+
| COUNT(tb1.ID) |
+---------------+
|         20000 |
+---------------+
1 row in set (0,02 sec)

mysql> SELECT COUNT(tb1.ID) FROM tb1 JOIN tb2 ON tb2.id1 = tb1.ID WHERE tb1.name = '10' AND tb2.ID = 10;
+--------------------------+
| count(`test`.`tb1`.`ID`) |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (3,88 sec)

Change in user-visible behavior

Requires documentation change