yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.96k stars 1.07k forks source link

[Jepsen][YSQL] Two concurrent complex update-insert transactions may both be successful, while only one will be presented #11165

Closed qvad closed 2 years ago

qvad commented 2 years ago

Jira Link: DB-8240

Description

Issue appeared after Jepsen version was updated to 0.2.5, not sure how this change was related so this issue appears, nevertheless we observe problem in transaction history on server side.

All current versions are affected, it may be an old issue. Issue disappears if we relax consistency model check from :strict-serialisable to :serialisable. Link to Jepsen consistency models decription

Issue is well reproducible, but provides a lot of noise in the results. Automated description may contain 100 transactions. Trying to fix this in future jepsen tests.

2 transactions in history, both - first insert into a table key 37

2021-12-19 22:52:57,377{GMT}    INFO    [jepsen worker 19] jepsen.util: 499 :ok :txn    [[:append 37 7]]
...
2021-12-19 22:52:57,404{GMT}    INFO    [jepsen worker 18] jepsen.util: 618 :ok :txn    [[:append 37 1]]

Code that acts here, both update and insert are evaluated.

(defn append-primary!
  "Writes a key based on primary key."
  [conn table row col v]
  (let [r (c/execute! conn [(str "update " table
                                 " set " col " = CONCAT(" col ", ',', ?) "
                                 "where k = ?") v row])]
    (when (= [0] r)
      ; No rows updated
      (c/execute! conn
                  [(str "insert into " table
                        " (k, k2, " col ") values (?, ?, ?)") row row v]))
    v))

Both transactions are committed, while we observe only one value after - note that we have read [:r 37 [1 10 11 ...

 T17 = {:type :ok, :f :txn, :value [[:r 37 [1 10 11 23 34 27 46 53 40 41 42 61 70 71 44 66 82 59 100 106 113 107 108 51 120 60 125 126 129 121 127 128 139 75 143 144 147 152 153 154 162 165 173 174 171 172 180 183 182 194 202 170 208 209 215 220 221 222 228 229 230 219 260 244 288 235 290 294 298 299 301 291 305 284 307 308 321 326 330 333 335 336 331 332 340 319 320 342 341 270 350 352 355 327 367 364 369 370 376 379 382 373 388 390 391 394]]], :time 130304584572, :process 515, :op-timing ["2021-12-19_22:53:08.087" "2021-12-19_22:53:08.110"], :index 14200}

20211219T225033.000Z.zip

qvad commented 2 years ago

Reviewed other run with trading all queries option enabled

Here we also have key = 37 and 2 values 8 and 11 pg log 0

2022-02-13 13:36:39.500 UTC [19544] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.504 UTC [19544] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.504 UTC [19544] LOG:  execute S_3: BEGIN
2022-02-13 13:36:39.504 UTC [19544] LOG:  execute <unnamed>: update append4 set v1 = CONCAT(v1, ',', $1) where k = $2
2022-02-13 13:36:39.504 UTC [19544] DETAIL:  parameters: $1 = '11', $2 = '18'
2022-02-13 13:36:39.510 UTC [19544] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.515 UTC [19544] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.517 UTC [19544] LOG:  execute S_3: BEGIN
2022-02-13 13:36:39.517 UTC [19544] LOG:  execute <unnamed>: insert into append4 (k, k2, v1) values ($1, $2, $3)
2022-02-13 13:36:39.517 UTC [19544] DETAIL:  parameters: $1 = '18', $2 = '18', $3 = '11'
2022-02-13 13:36:39.524 UTC [19630] LOG:  execute <unnamed>: select (v1) from append1 where k = $1
2022-02-13 13:36:39.524 UTC [19630] DETAIL:  parameters: $1 = '13'
2022-02-13 13:36:39.529 UTC [19544] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.530 UTC [19630] LOG:  execute S_1: COMMIT

pg log 1

2022-02-13 13:36:39.451 UTC [21640] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.451 UTC [21640] LOG:  execute <unnamed>: BEGIN
2022-02-13 13:36:39.452 UTC [21640] LOG:  execute <unnamed>: update append4 set v1 = CONCAT(v1, ',', $1) where k = $2
2022-02-13 13:36:39.452 UTC [21640] DETAIL:  parameters: $1 = '8', $2 = '18'
2022-02-13 13:36:39.463 UTC [21640] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.466 UTC [21640] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.467 UTC [21640] LOG:  execute <unnamed>: BEGIN
2022-02-13 13:36:39.518 UTC [21640] LOG:  execute <unnamed>: insert into append4 (k, k2, v1) values ($1, $2, $3)
2022-02-13 13:36:39.518 UTC [21640] DETAIL:  parameters: $1 = '18', $2 = '18', $3 = '8'
2022-02-13 13:36:39.530 UTC [21640] LOG:  execute S_1: COMMIT
2022-02-13 13:36:39.541 UTC [21640] LOG:  execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL
2022-02-13 13:36:39.542 UTC [21640] LOG:  execute <unnamed>: BEGIN

For tserver that has faulty transaction new segment allocation messages exists.

I0213 13:36:39.121099 17158 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/docdb/conflict_resolution.cc:89): 08e88f47-7d03-4083-9a3b-90378bb39d0f Conflicts with higher priority transaction: 07ef9776-721b-4375-b5dc-52aaffd3d5bf (transaction error 3)
I0213 13:36:39.126605 17353 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/docdb/conflict_resolution.cc:89): dc533958-ef14-4681-a9bc-b0d6c56148de Conflicts with higher priority transaction: 07ef9776-721b-4375-b5dc-52aaffd3d5bf (transaction error 3)
I0213 13:36:39.199928 17389 tablet_service.cc:334] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154 [state=RUNNING]: Write failed: Operation failed. Try again (yb/tablet/running_transaction.cc:465): Transaction aborted: ed4e3ebe-cdea-4bc2-a42d-11dcd65caae2 (pgsql error 40001)
I0213 13:36:39.211046 17355 transaction_coordinator.cc:1220] T 1ee9551fcff440bba7dfc4d4247cfa53 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction 7dffceba-3540-4dbe-a07e-5447cb31867f: transaction_id: "}\377\316\2725@M\276\240~TG\3131\206\177" status: COMMITTED tablets: "817f09ba27a9491986d10f7d878b175b" tablets: "e3ccd9c4fcba4b3aab99d42ef5887165" tablets: "1493614872074231b58fd16b319688c6"
I0213 13:36:39.229215 17354 transaction_coordinator.cc:1220] T b28a4067961d4aa0975d78d4873a7a80 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction d7029964-6d76-45a4-9386-a471ea286975: transaction_id: "\327\002\231dmvE\244\223\206\244q\352(iu" status: COMMITTED tablets: "0e127273f4454b259720239eab55bcce" tablets: "1493614872074231b58fd16b319688c6" tablets: "d5ac976fa0a7470eb48c820b411b6ff0"
I0213 13:36:39.272332 18926 log.cc:808] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Max segment size 1048576 reached. Starting new segment allocation.
I0213 13:36:39.274272 18926 log.cc:683] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Last appended OpId in segment /home/yugabyte/data/yb-data/tserver/wals/table-000033df00003000800000000000400a/tablet-d5ac976fa0a7470eb48c820b411b6ff0/wal-000000002: 2.3132
I0213 13:36:39.279155 18926 log.cc:692] T d5ac976fa0a7470eb48c820b411b6ff0 P a3a83685e62249cb9285000f1d656154: Rolled over to a new segment: /home/yugabyte/data/yb-data/tserver/wals/table-000033df00003000800000000000400a/tablet-d5ac976fa0a7470eb48c820b411b6ff0/wal-000000003
I0213 13:36:39.882437 18885 transaction_coordinator.cc:1220] T b2f4a78b3a034b3d890693c7e7590726 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction aa1786be-9e3e-4e93-bba5-22d497cc653f: transaction_id: "\252\027\206\276\236>N\223\273\245\"\324\227\314e?" status: COMMITTED tablets: "98ad2f9e07ef4f7ca8ccc9fdad66bbb1" tablets: "e818b51eb9b3414b91ed5235a11ee335"
I0213 13:36:39.907943 17158 transaction_coordinator.cc:1220] T cad1fa6619f54e3fbe41d760d7278f33 P a3a83685e62249cb9285000f1d656154: Request to unknown transaction 3140b9ac-ba49-4de7-b969-47655ecb3cf0: transaction_id: "1@\271\254\272IM\347\271iGe^\313<\360" status: COMMITTED tablets: "e3ccd9c4fcba4b3aab99d42ef5887165" tablets: "1493614872074231b58fd16b319688c6"
I0213 13:36:39.938064 17318 write_operation.cc:93] Apply operation failed: Operation failed. Try again (yb/tablet/tablet.cc:1169): Transaction was recently aborted: 04c07832-6842-4051-b158-d3794382bc69: 40001 (pgsql error 40001) (transaction error 1)
rthallamko3 commented 2 years ago

@es1024 , Can you help take a look?

qvad commented 2 years ago

Got another reproducer I think that can be evaluated against local cluster. So scenario is classic bank workload, but with INSERTs and DELETEs.

UPDATEs.

firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance - %s WHERE id = %s;",
                            getTableName(), balance, accFrom));
secondOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance + %s WHERE id = %s;",
                            getTableName(), balance, accTo));

INSERTs:

accTo = endKey.get();

firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance - %s WHERE id = %s;",
           getTableName(), balance, accFrom));
secondOpResult = statement.executeUpdate(String.format("INSERT INTO %s (id, balance) VALUES (%s, %s);",
           getTableName(), accTo, balance));

counterToIncrement = endKey;

DELETEs

accFrom = startKey.get();

while (accTo == accFrom) {
    accTo = ThreadLocalRandom.current().nextLong(startKey.get(), endKey.get());
}

statement.execute(String.format("SELECT balance FROM %s WHERE id = %s;", getTableName(), accFrom));
try (ResultSet rs = statement.getResultSet()) {
    rs.next();
    balance = rs.getInt(1);
}

firstOpResult = statement.executeUpdate(String.format("UPDATE %s SET balance = balance + %s WHERE id = %s;",
        getTableName(), balance, accTo));
secondOpResult = statement.executeUpdate(String.format("DELETE FROM %s WHERE id = %s;",
        getTableName(), accFrom));

counterToIncrement = startKey;

So during this test there is a possibility that on INSERT operation few different threads will try to commit the same key.

Try to start this jar file with following parameters and then check history table: select * from bankhistory order by id;. There should be 2 or more rows in the end with op == 2 (INSERT) and accto should be the same. Which is exactly the case we observe here - 2 successful insert transactions into the same key. Ideally it might happen in seconds.

Arguments to launch jar: yb-sample-apps.jar.zip

yb-sample-apps.jar com.yugabyte.sample.Main --workload SqlBankTransfer --nodes 192.168.64.6:5433 --num_threads_write 10 --num_threads_read 10 --bank_enable_logging true --bank_exit_on_fail true
serial op accFrom accTo balance
1201 2 2 22 3
1301 2 8 22 1
es1024 commented 2 years ago

Simple case to reproduce this bug: Schema:

CREATE TABLE test(k INTEGER PRIMARY KEY, v VARCHAR);

Using two connections to the database:

ysqlsh #1> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ysqlsh #1> INSERT INTO test VALUES(1, 't1');
ysqlsh #2> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ysqlsh #2> INSERT INTO test VALUES(1, 't2');
ysqlsh #1> COMMIT;
ysqlsh #2> COMMIT;

Both transactions will commit, even though they're writing to the same row. This is likely because we're doing primary key checks only when processing INSERT and not at commit time, and not acquiring the right locks to cause the conflict to be detected. This is a regression introduced by 83a150b080 and only affects SERIALIZABLE isolation.