yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[YSQL] Transactional inconsistency if using datatype casting in serializable reads #23195

Open qvad opened 3 months ago

qvad commented 3 months ago

Jira Link: DB-12135

Description

In classic bank workload we added datatypes support to extend coverage for existing features. E.g. now there are varchars, dates, timestamps and json support.

Balance selection: JSON - SELECT (SUBSTRING((%s->'content'->>'value') FROM ':(.*?)(:|$)'))::bigint AS balance FROM %s WHERE id = % Varchar - SELECT (SUBSTRING(%s FROM ':(.*?)(:|$)'))::bigint FROM %s WHERE id = %s Money - SELECT %s::numeric::integer FROM %s WHERE id = %s Timestamp - SELECT EXTRACT(MILLISECONDS FROM %s) AS epoch_days FROM %s WHERE id = %s Date - SELECT (EXTRACT(EPOCH FROM %s) / 86400) AS sum_of_epoch_days FROM %s WHERE id = %s

Note that due to #23121 reads in serializable are force to be read only (use JDBC API)

Last trace from client log. Note that after wring read few correct reads are successful. Which means that data might be actually valid.

2024-07-12 21:08:19,512 [Thread-1] INFO MetricsTracker - Read: 10.20 ops/sec (11334.72 ms/op), 41687 total ops  |  Write: 3.00 ops/sec (16492.70 ms/op), 8083 total ops  |  Uptime: 1080093 ms | 
2024-07-12 21:08:19,606 [Thread-54] ERROR ExceptionsTracker - Failed to write history: ERROR: Rpc timeout, passed: 15.000s, timeout: 15.000s, now: 1681.793s, last_read_time_: 1666.793s
com.yugabyte.util.PSQLException: ERROR: Rpc timeout, passed: 15.000s, timeout: 15.000s, now: 1681.793s, last_read_time_: 1666.793s
    at com.yugabyte.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.jdbc.PgStatement.executeInternal(PgStatement.java:490) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.jdbc.PgStatement.execute(PgStatement.java:408) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:162) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:130) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.sample.apps.SqlBankLargeValues.evaluateUpdate(SqlBankLargeValues.java:188) ~[yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.sample.apps.SqlBankLargeValues.doWrite(SqlBankLargeValues.java:483) [yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.sample.apps.AppBase.performWrite(AppBase.java:1016) [yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.sample.common.IOPSThread.run(IOPSThread.java:88) [yb-stress-sample-apps-1.1.65.jar:?]
2024-07-12 21:08:20,992 [Thread-105] ERROR ExceptionsTracker - Failed to compare sums: Expected 1000000 - actual 998014 
java.lang.AssertionError: Expected 1000000 - actual 998014 
    at com.yugabyte.sample.apps.SqlBankLargeValues.doRead(SqlBankLargeValues.java:256) [yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.sample.apps.AppBase.performRead(AppBase.java:1049) [yb-stress-sample-apps-1.1.65.jar:?]
    at com.yugabyte.sample.common.IOPSThread.run(IOPSThread.java:89) [yb-stress-sample-apps-1.1.65.jar:?]
2024-07-12 21:08:24,513 [Thread-1] INFO MetricsTracker - Read: 4.00 ops/sec (15556.37 ms/op), 41707 total ops  |  Write: 2.80 ops/sec (18585.33 ms/op), 8097 total ops  |  Uptime: 1085094 ms | 
2024-07-12 21:08:29,513 [Thread-1] INFO MetricsTracker - Read: 3.20 ops/sec (12392.51 ms/op), 41723 total ops  |  Write: 2.00 ops/sec (17021.44 ms/op), 8107 total ops  |  Uptime: 1090094 ms | 
2024-07-12 21:08:34,513 [Thread-1] INFO MetricsTracker - Read: 1.80 ops/sec (28696.77 ms/op), 41732 total ops  |  Write: 1.40 ops/sec (15742.60 ms/op), 8114 total ops  |  Uptime: 1095094 ms | 
2024-07-12 21:08:39,513 [Thread-1] INFO MetricsTracker - Read: 2.40 ops/sec (21774.67 ms/op), 41744 total ops  |  Write: 0.80 ops/sec (33163.99 ms/op), 8118 total ops  |  Uptime: 1100094 ms | 
2024-07-12 21:08:44,513 [Thread-1] INFO MetricsTracker - Read: 1.40 ops/sec (26593.84 ms/op), 41751 total ops  |  Write: 0.40 ops/sec (25020.86 ms/op), 8120 total ops  |  Uptime: 1105094 ms | 
2024-07-12 21:08:49,514 [Thread-1] INFO MetricsTracker - Read: 0.60 ops/sec (31671.58 ms/op), 41754 total ops  |  Write: 0.20 ops/sec (30027.77 ms/op), 8121 total ops  |  Uptime: 1110095 ms | 
2024-07-12 21:08:54,514 [Thread-1] INFO MetricsTracker - Read: 0.60 ops/sec (30141.24 ms/op), 41757 total ops  |  Write: 0.00 ops/sec (0.00 ms/op), 8121 total ops  |  Uptime: 1115095 ms | 
2024-07-12 21:08:59,514 [Thread-1] INFO MetricsTracker - Read: 0.20 ops/sec (50092.69 ms/op), 41758 total ops  |  Write: 0.00 ops/sec (0.00 ms/op), 8121 total ops  |  Uptime: 1120095 ms | 
2024-07-12 21:09:04,514 [Thread-1] INFO MetricsTracker - Read: 0.00 ops/sec (0.00 ms/op), 41758 total ops  |  Write: 0.20 ops/sec (40066.37 ms/op), 8122 total ops  |  Uptime: 1125095 ms | 
....

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

qvad commented 3 months ago

After running few experiments found one cause of this issue - adding datatype casting to reads make workload fail on transactional validation checks, i.e. even INT datatype could fail. Changed issue description

SELECT SUM(%s)::bigint FROM %s
SELECT %s::bigint FROM %s WHERE id = %s