Closed altmannmarcelo closed 2 months ago
This issue is not replicable in Postgres.
postgres=> create table tb1 (ID INT PRIMARY KEY, meta JSON, dt timestamp);
CREATE TABLE
postgres=> insert into tb1 values (1, '{"linked_urls": []}', NOW()), (2, NULL, NOW());
INSERT 0 2
postgres=> CREATE CACHE FROM SELECT * FROM tb1 WHERE ID = ?;
postgres=> SELECT * FROM tb1 WHERE ID = 1;
id | meta | dt
----+--------------------+----------------------------
1 | {"linked_urls":[]} | 2024-08-27 23:16:41.408623
(1 row)
postgres=> EXPLAIN LAST STATEMENT;
Query_destination | ReadySet_error
-------------------+----------------
readyset | ok
(1 row)
postgres=> SELECT * FROM tb1 WHERE ID = 2;
id | meta | dt
----+------+----------------------------
2 | | 2024-08-27 23:16:41.408623
(1 row)
postgres=> EXPLAIN LAST STATEMENT;
Query_destination | ReadySet_error
-------------------+----------------
readyset | ok
(1 row)
postgres=> UPDATE tb1 SET dt = '2025-08-27 10:30:54' WHERE ID = 2;
UPDATE 1
postgres=> UPDATE tb1 SET dt = '2025-08-27 10:30:54' WHERE ID = 1;
UPDATE 1
postgres=> SELECT * FROM tb1 WHERE ID = 1;
id | meta | dt
----+--------------------+---------------------
1 | {"linked_urls":[]} | 2025-08-27 10:30:54
(1 row)
postgres=> EXPLAIN LAST STATEMENT;
Query_destination | ReadySet_error
-------------------+----------------
readyset | ok
(1 row)
postgres=> SELECT * FROM tb1 WHERE ID = 2;
id | meta | dt
----+------+---------------------
2 | | 2025-08-27 10:30:54
(1 row)
postgres=> EXPLAIN LAST STATEMENT;
Query_destination | ReadySet_error
-------------------+----------------
readyset | ok
(1 row)
postgres=>
In summary, the issue only happens if the JSON field comes populated during snapshot. Them we get the records stalled.
The issue relies in the fact that an update, we use TableOperation::DeleteRow
to remove the old row, which requires an entire match.
MySQL add extra spaces in json for "better readability"
https://dev.mysql.com/doc/refman/8.0/en/json.html#json-normalization
MySQL also discards extra whitespace between keys, values, or elements in the original JSON document, and leaves (or inserts, when necessary) a single space following each comma (
,
) or colon (:
) when displaying it. This is done to enhance readability.
This is confirmed from source code analyzes:
Then when we receive the row via binlog, the extra space is not part of the value, which causes the missmatch:
Row from snapshot:
[Int(1), Text("{\"linked_urls\": []}"), TimestampTz(TimestampTz { is_zero: false, has_negative_offset: false, has_timezone: false, has_date_only: false, offset: 0, subsecond_digits: 0, ts: 2025-08-27T13:30:54+00:00 })]
Row from binlog:
Before:
[Int(1), Text("{\"linked_urls\":[]}"), TimestampTz(TimestampTz { is_zero: false, has_negative_offset: false, has_timezone: false, has_date_only: false, offset: 0, subsecond_digits: 0, ts: 2025-08-27T13:30:54+00:00 })]
After:
[Int(1), Text("{\"linked_urls\":[]}"), TimestampTz(TimestampTz { is_zero: false, has_negative_offset: false, has_timezone: false, has_date_only: false, offset: 0, subsecond_digits: 0, ts: 2025-08-28T13:30:54+00:00 })]
There are two issues here:
Description
If a table has JSON field and the field is not null, this causes the records to get stalled.
Change in user-visible behavior
Requires documentation change