Closed qvad closed 2 years ago
Reviewed other run with trading all queries option enabled
Here we also have key = 37 and 2 values 8 and 11 pg log 0
2022-02-13 13:36:39.500 UTC [19544] LOG: execute S_1: COMMIT
2022-02-13 13:36:39.504 UTC [19544] LOG: execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.504 UTC [19544] LOG: execute S_3: BEGIN
2022-02-13 13:36:39.504 UTC [19544] LOG: execute <unnamed>: update append4 set v1 = CONCAT(v1, ',', $1) where k = $2
2022-02-13 13:36:39.504 UTC [19544] DETAIL: parameters: $1 = '11', $2 = '18'
2022-02-13 13:36:39.510 UTC [19544] LOG: execute S_1: COMMIT
2022-02-13 13:36:39.515 UTC [19544] LOG: execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.517 UTC [19544] LOG: execute S_3: BEGIN
2022-02-13 13:36:39.517 UTC [19544] LOG: execute <unnamed>: insert into append4 (k, k2, v1) values ($1, $2, $3)
2022-02-13 13:36:39.517 UTC [19544] DETAIL: parameters: $1 = '18', $2 = '18', $3 = '11'
2022-02-13 13:36:39.524 UTC [19630] LOG: execute <unnamed>: select (v1) from append1 where k = $1
2022-02-13 13:36:39.524 UTC [19630] DETAIL: parameters: $1 = '13'
2022-02-13 13:36:39.529 UTC [19544] LOG: execute S_1: COMMIT
2022-02-13 13:36:39.530 UTC [19630] LOG: execute S_1: COMMIT
pg log 1
2022-02-13 13:36:39.451 UTC [21640] LOG: execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.451 UTC [21640] LOG: execute <unnamed>: BEGIN
2022-02-13 13:36:39.452 UTC [21640] LOG: execute <unnamed>: update append4 set v1 = CONCAT(v1, ',', $1) where k = $2
2022-02-13 13:36:39.452 UTC [21640] DETAIL: parameters: $1 = '8', $2 = '18'
2022-02-13 13:36:39.463 UTC [21640] LOG: execute S_1: COMMIT
2022-02-13 13:36:39.466 UTC [21640] LOG: execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.467 UTC [21640] LOG: execute <unnamed>: BEGIN
2022-02-13 13:36:39.518 UTC [21640] LOG: execute <unnamed>: insert into append4 (k, k2, v1) values ($1, $2, $3)
2022-02-13 13:36:39.518 UTC [21640] DETAIL: parameters: $1 = '18', $2 = '18', $3 = '8'
2022-02-13 13:36:39.530 UTC [21640] LOG: execute S_1: COMMIT
2022-02-13 13:36:39.541 UTC [21640] LOG: execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.542 UTC [21640] LOG: execute <unnamed>: BEGIN
For tserver that has faulty transaction new segment allocation messages exists.
I0213 13:36:39.121099 17158 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/docdb/conflict_resolution.cc:89): 08e88f47-7d03-4083-9a3b-90378bb39d0f Conflicts with higher priority transaction: 07ef9776-721b-4375-b5dc-52aaffd3d5bf (transaction error 3)
I0213 13:36:39.126605 17353 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/docdb/conflict_resolution.cc:89): dc533958-ef14-4681-a9bc-b0d6c56148de Conflicts with higher priority transaction: 07ef9776-721b-4375-b5dc-52aaffd3d5bf (transaction error 3)
I0213 13:36:39.199928 17389 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/tablet/running_transaction.cc:465): Transaction aborted: ed4e3ebe-cdea-4bc2-a42d-11dcd65caae2 (pgsql error 40001)
I0213 13:36:39.211046 17355 transaction_coordinator.cc:1220] T 1ee9551fcff440bba7dfc4d4247cfa53 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction 7dffceba-3540-4dbe-a07e-5447cb31867f: transaction_id: "}\377\316\2725@M\276\240~TG\3131\206\177" status: COMMITTED tablets: "817f09ba27a9491986d10f7d878b175b" tablets: "e3ccd9c4fcba4b3aab99d42ef5887165" tablets: "1493614872074231b58fd16b319688c6"
I0213 13:36:39.229215 17354 transaction_coordinator.cc:1220] T b28a4067961d4aa0975d78d4873a7a80 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction d7029964-6d76-45a4-9386-a471ea286975: transaction_id: "\327\002\231dmvE\244\223\206\244q\352(iu" status: COMMITTED tablets: "0e127273f4454b259720239eab55bcce" tablets: "1493614872074231b58fd16b319688c6" tablets: "d5ac976fa0a7470eb48c820b411b6ff0"
I0213 13:36:39.272332 18926 log.cc:808] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Max segment size 1048576 reached. Starting new segment allocation.
I0213 13:36:39.274272 18926 log.cc:683] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Last appended OpId in segment /home/yugabyte/data/yb-data/tserver/wals/table-000033df00003000800000000000400a/tablet-d5ac976fa0a7470eb48c820b411b6ff0/wal-000000002: 2.3132
I0213 13:36:39.279155 18926 log.cc:692] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Rolled over to a new segment: /home/yugabyte/data/yb-data/tserver/wals/table-000033df00003000800000000000400a/tablet-d5ac976fa0a7470eb48c820b411b6ff0/wal-000000003
I0213 13:36:39.882437 18885 transaction_coordinator.cc:1220] T b2f4a78b3a034b3d890693c7e7590726 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction aa1786be-9e3e-4e93-bba5-22d497cc653f: transaction_id: "\252\027\206\276\236>N\223\273\245\"\324\227\314e?" status: COMMITTED tablets: "98ad2f9e07ef4f7ca8ccc9fdad66bbb1" tablets: "e818b51eb9b3414b91ed5235a11ee335"
I0213 13:36:39.907943 17158 transaction_coordinator.cc:1220] T cad1fa6619f54e3fbe41d760d7278f33 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction 3140b9ac-ba49-4de7-b969-47655ecb3cf0: transaction_id: "1@\271\254\272IM\347\271iGe^\313<\360" status: COMMITTED tablets: "e3ccd9c4fcba4b3aab99d42ef5887165" tablets: "1493614872074231b58fd16b319688c6"
I0213 13:36:39.938064 17318 write_operation.cc:93] Apply operation failed: Operation failed. Try again (yb/tablet/tablet.cc:1169): Transaction was recently aborted: 04c07832-6842-4051-b158-d3794382bc69: 40001 (pgsql error 40001) (transaction error 1)
@es1024 , Can you help take a look?
Got another reproducer I think that can be evaluated against local cluster. So scenario is classic bank workload, but with INSERTs and DELETEs.
UPDATEs.
firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance - %s WHERE id = %s;",
getTableName(), balance, accFrom));
secondOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance + %s WHERE id = %s;",
getTableName(), balance, accTo));
INSERTs:
accTo = endKey.get();
firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance - %s WHERE id = %s;",
getTableName(), balance, accFrom));
secondOpResult = statement.executeUpdate(String.format("INSERT INTO %s (id, balance) VALUES (%s, %s);",
getTableName(), accTo, balance));
counterToIncrement = endKey;
DELETEs
accFrom = startKey.get();
while (accTo == accFrom) {
accTo = ThreadLocalRandom.current().nextLong(startKey.get(), endKey.get());
}
statement.execute(String.format("SELECT balance FROM %s WHERE id = %s;", getTableName(), accFrom));
try (ResultSet rs = statement.getResultSet()) {
rs.next();
balance = rs.getInt(1);
}
firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance + %s WHERE id = %s;",
getTableName(), balance, accTo));
secondOpResult = statement.executeUpdate(String.format("DELETE FROM %s WHERE id = %s;",
getTableName(), accFrom));
counterToIncrement = startKey;
So during this test there is a possibility that on INSERT operation few different threads will try to commit the same key.
Try to start this jar file with following parameters and then check history table: select * from bankhistory order by id;
. There should be 2 or more rows in the end with op == 2
(INSERT) and accto should be the same. Which is exactly the case we observe here - 2 successful insert transactions into the same key. Ideally it might happen in seconds.
Arguments to launch jar: yb-sample-apps.jar.zip
yb-sample-apps.jar com.yugabyte.sample.Main --workload SqlBankTransfer --nodes 192.168.64.6:5433 --num_threads_write 10 --num_threads_read 10 --bank_enable_logging true --bank_exit_on_fail true
serial | op | accFrom | accTo | balance |
---|---|---|---|---|
1201 | 2 | 2 | 22 | 3 |
1301 | 2 | 8 | 22 | 1 |
Simple case to reproduce this bug: Schema:
CREATE TABLE test(k INTEGER PRIMARY KEY, v VARCHAR);
Using two connections to the database:
ysqlsh #1> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ysqlsh #1> INSERT INTO test VALUES(1, 't1');
ysqlsh #2> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ysqlsh #2> INSERT INTO test VALUES(1, 't2');
ysqlsh #1> COMMIT;
ysqlsh #2> COMMIT;
Both transactions will commit, even though they're writing to the same row. This is likely because we're doing primary key checks only when processing INSERT and not at commit time, and not acquiring the right locks to cause the conflict to be detected. This is a regression introduced by 83a150b080 and only affects SERIALIZABLE isolation.
Jira Link: DB-8240
Description
Issue appeared after Jepsen version was updated to 0.2.5, not sure how this change was related so this issue appears, nevertheless we observe problem in transaction history on server side.
All current versions are affected, it may be an old issue. Issue disappears if we relax consistency model check from
:strict-serialisable
to:serialisable
. Link to Jepsen consistency models decriptionIssue is well reproducible, but provides a lot of noise in the results. Automated description may contain 100 transactions. Trying to fix this in future jepsen tests.
2 transactions in history, both - first insert into a table key 37
Code that acts here, both update and insert are evaluated.
Both transactions are committed, while we observe only one value after - note that we have read [
:r 37 [1 10 11 ...
20211219T225033.000Z.zip