yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
9.03k stars 1.08k forks source link

[YCQL] Retryable error codes #3625

Open ndeodhar opened 4 years ago

ndeodhar commented 4 years ago

Jira Link: DB-1605 Concurrent transactional CQL queries may fail due to intermittent issues like transaction conflicts, or query timeouts. In such cases, applications can retry the queries.

Today, we don't have any documentation of what retryable error codes an application can expect and handle. We should come up with and document the possible, retryable error codes. This makes it easy for applications to be written to handle these error codes appropriately.

OlegLoginov commented 4 years ago

I've tested parallel conflicting CQL batch requests like:

            final String statement = String.format(
                "begin transaction" +
                "  insert into %s (k, c) values (%d, %d) if c <> 99 else error;" +
                "  insert into %s (k, c) values (%d, %d) if c <> 99 else error;" +
                "end transaction;",
                table_name1, key1, -key1,
                table_name2, key2, -key2);
            List<Row> rows = session.execute(statement).all();

and it looks like currently such requests are always successful even if there are internal transaction conflicts. There are no errors or exceptions on the application side.

OlegLoginov commented 4 years ago

In my testing I've used the following java test:

  @Test
  public void test3() throws Exception {
      String table_name1 = "test_table1_";
      session.execute(String.format(
          "create table %s (k int primary key, c int) " +
          "with transactions = { 'enabled' : true };", table_name1));

      String table_name2 = "test_table2_";
      session.execute(String.format(
          "create table %s (k int primary key, c int) " +
          "with transactions = { 'enabled' : true };", table_name2));
      Thread.sleep(1000);

      LOG.info("=== START WRITING === ");

      List<Thread> threads = new ArrayList<Thread>();
      while (threads.size() < 10) {
        Thread thread = new Thread(() -> {
          Random r = new Random();
          while (!Thread.interrupted()) {
            int key1 = r.nextInt(50);
            int key2 = r.nextInt(50);

            final String statement = String.format(
                "begin transaction" +
                "  insert into %s (k, c) values (%d, %d) if c <> 99 else error;" +
                "  insert into %s (k, c) values (%d, %d) if c <> 99 else error;" +
                "end transaction;",
                table_name1, key1, -key1,
                table_name2, key2, -key2);
            List<Row> rows = session.execute(statement).all();

            String s = String.format("=== Rows[%d]: ", rows.size());
            for (Row row : rows) { s += row.toString() + " | "; }
            LOG.info(s);
          }
        });
        thread.start();
        threads.add(thread);
      }
      try {
        Thread.sleep(5000);
      } finally {
        LOG.info("=== STOP WRITING & FINISHING THREADS === ");
        for (Thread thread : threads) {
          thread.interrupt();
          thread.join();
        }
      }

      LOG.info("=== DROP TABLES === ");
      session.execute(String.format("drop table %s;", table_name1));
      session.execute(String.format("drop table %s;", table_name2));
  }

Part of the result log:

2020-02-26 16:15:07,425 (Thread-16) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts1|pid309|:28251|http://127.252.229.41:28872 I0226 16:15:07.426488   616 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): 840e2203-68dd-46f8-9e9b-728460400d5e Conflicts with committed transaction: 8e427ab9-2601-4bd7-898f-7c9fe5b1de76 (transaction error 3)
2020-02-26 16:15:07,427 (Thread-20) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,427 (Thread-18) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,427 (Thread-21) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts3|pid446|:15131|http://127.162.164.74:13085 I0226 16:15:07.429607   477 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): d709db54-d8ad-47ae-af53-0457285e9248 Conflicts with higher priority transaction: d34edf48-c3be-4b45-b1ef-29b4acb43d81 (transaction error 3)
2020-02-26 16:15:07,432 (Thread-17) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts3|pid446|:15131|http://127.162.164.74:13085 I0226 16:15:07.434355  1227 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): a2e214d2-f254-49b9-93df-3b2910175228 Conflicts with higher priority transaction: d34edf48-c3be-4b45-b1ef-29b4acb43d81 (transaction error 3)
2020-02-26 16:15:07,435 (Thread-19) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,435 (Thread-22) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,435 (Thread-21) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts3|pid446|:15131|http://127.162.164.74:13085 I0226 16:15:07.436215  1222 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): ba2585c0-fb83-477d-9f4a-26f8ff2ce075 Conflicts with higher priority transaction: 24207c2e-c57a-44ba-ba8e-e06f347bd0f9 (transaction error 3)
2020-02-26 16:15:07,436 (Thread-24) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,438 (Thread-23) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,443 (Thread-18) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,445 (Thread-19) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,446 (Thread-24) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,446 (Thread-21) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,448 (Thread-23) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,451 (Thread-22) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,452 (Thread-20) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,456 (Thread-23) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts3|pid446|:15131|http://127.162.164.74:13085 I0226 16:15:07.457847  1221 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): 1f3f827e-cc29-4158-bb43-afdecb10f1aa Conflicts with higher priority transaction: 9ae09de2-c891-4c95-a2fb-1c4c63a4ecbe (transaction error 3)
2020-02-26 16:15:07,458 (Thread-21) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts3|pid446|:15131|http://127.162.164.74:13085 I0226 16:15:07.460592   626 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): 303bfe65-c57a-473b-8cf6-d016f8029764 Conflicts with committed transaction: 9ae09de2-c891-4c95-a2fb-1c4c63a4ecbe (transaction error 3)
2020-02-26 16:15:07,462 (Thread-24) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:15:07,462 (Thread-22) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts3|pid446|:15131|http://127.162.164.74:13085 W0226 16:15:07.463474   477 intent_aware_iterator.cc:116] Intent for transaction w/o metadata: 9ae09de2-c891-4c95-a2fb-1c4c63a4ecbe
ts1|pid309|:28251|http://127.252.229.41:28872 I0226 16:15:07.463930  1479 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): 75c33867-e4f3-4042-8655-97bad83b999f Conflicts with higher priority transaction: 303bfe65-c57a-473b-8cf6-d016f8029764 (transaction error 3)
ts3|pid446|:15131|http://127.162.164.74:13085 I0226 16:15:07.465620  1221 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): f3258fa5-e70e-4af6-91d3-0c0655cfde62 Conflicts with higher priority transaction: 75c33867-e4f3-4042-8655-97bad83b999f (transaction error 3)
2020-02-26 16:15:07,466 (Thread-24) [INFO - org.yb.cql.TestTransaction.lambda$test$1(TestTransaction.java:817)] === Rows[2]: Row[true] | Row[true] | 
ts1|pid309|:28251|http://127.252.229.41:28872 I0226 16:15:07.468346   384 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): d7c04d60-1ada-415f-8768-1e737a3464dd Conflicts with higher priority transaction: f3258fa5-e70e-4af6-91d3-0c0655cfde62 (transaction error 3)
OlegLoginov commented 4 years ago

It's possible to use syntax like INSERT INTO ... IF NOT EXISTS ELSE ERROR; to get exception if the conflicting insertion failed. But it's not related (orthogonal) to the transaction conflict, because the error is generated on the statement run-time execution phase via primary key checking - before a real transaction start.

The test:

  @Test
  public void test() throws Exception {
      String table_name1 = "test_table1_";
      session.execute(String.format(
          "create table %s (k int primary key, c int) " +
          "with transactions = { 'enabled' : true };", table_name1));

      String table_name2 = "test_table2_";
      session.execute(String.format(
          "create table %s (k int primary key, c int) " +
          "with transactions = { 'enabled' : true };", table_name2));
      Thread.sleep(1000);

      LOG.info("=== START WRITING === ");

      List<Thread> threads = new ArrayList<Thread>();
      while (threads.size() < 10) {
        Thread thread = new Thread(() -> {
          Random r = new Random();
          while (!Thread.interrupted()) {
            int key1 = r.nextInt(50);
            int key2 = r.nextInt(50);

            final String statement = String.format(
                "begin transaction" +
                "  insert into %s (k, c) values (%d, %d) if not exists else error;" +
                "  insert into %s (k, c) values (%d, %d) if not exists else error;" +
                "end transaction;",
                table_name1, key1, -key1,
                table_name2, key2, -key2);
            try {
              List<Row> rows = session.execute(statement).all();

              String s = String.format("=== Rows[%d]: ", rows.size());
              for (Row row : rows) { s += row.toString() + " | "; }
              LOG.info(s);
            } catch (com.datastax.driver.core.exceptions.InvalidQueryException e) {
              LOG.info("Expected exception: " + e.getMessage());
              // Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
              // Execution Error. Condition on table test_table1_ was not satisfied.
              assertTrue(e.getMessage().contains("Execution Error. Condition on table test_table"));
              assertTrue(e.getMessage().contains("_ was not satisfied."));
            }
          }
        });
        thread.start();
        threads.add(thread);
      }
      try {
        Thread.sleep(5000);
      } finally {
        LOG.info("=== STOP WRITING & FINISHING THREADS === ");
        for (Thread thread : threads) {
          thread.interrupt();
          thread.join();
        }
      }

      LOG.info("=== DROP TABLES === ");
      session.execute(String.format("drop table %s;", table_name1));
      session.execute(String.format("drop table %s;", table_name2));
  }

Part of the result log with caught exceptions in case of primary key conflict:

2020-02-26 16:17:21,610 (Thread-23) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:873)] === Rows[2]: Row[true] | Row[true] | 
2020-02-26 16:17:21,614 (Thread-20) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:873)] === Rows[2]: Row[true] | Row[true] | 
ts3|pid3347|:24592|http://127.39.135.202:13538 E0226 16:17:21.616593  3472 process_context.cc:180] SQL Error: Execution Error. Condition on table test_table2_ was not satisfied.
ts3|pid3347|:24592|http://127.39.135.202:13538 begin transaction  insert into test_table1_ (k, c) values (26, -26) if not exists else error;  insert into test_table2_ (k, c) values (36, -36) if not exists else error;end transaction;
ts3|pid3347|:24592|http://127.39.135.202:13538                                                                                                       ^^^^
2020-02-26 16:17:21,617 (Thread-23) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:875)] Expected exception: Execution Error. Condition on table test_table2_ was not satisfied.
begin transaction  insert into test_table1_ (k, c) values (26, -26) if not exists else error;  insert into test_table2_ (k, c) values (36, -36) if not exists else error;end transaction;
                                                                                                      ^^^^
 (ql error -300)
ts2|pid3279|:11644|http://127.108.213.117:30182 E0226 16:17:21.618824  3305 process_context.cc:180] SQL Error: Execution Error. Condition on table test_table2_ was not satisfied.
ts2|pid3279|:11644|http://127.108.213.117:30182 begin transaction  insert into test_table1_ (k, c) values (4, -4) if not exists else error;  insert into test_table2_ (k, c) values (24, -24) if not exists else error;end transaction;
ts2|pid3279|:11644|http://127.108.213.117:30182                                                                                                     ^^^^
2020-02-26 16:17:21,619 (Thread-22) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:875)] Expected exception: Execution Error. Condition on table test_table2_ was not satisfied.
begin transaction  insert into test_table1_ (k, c) values (4, -4) if not exists else error;  insert into test_table2_ (k, c) values (24, -24) if not exists else error;end transaction;
                                                                                                    ^^^^
 (ql error -300)
ts2|pid3279|:11644|http://127.108.213.117:30182 E0226 16:17:21.621426  3728 process_context.cc:180] SQL Error: Execution Error. Condition on table test_table2_ was not satisfied.
ts2|pid3279|:11644|http://127.108.213.117:30182 begin transaction  insert into test_table1_ (k, c) values (16, -16) if not exists else error;  insert into test_table2_ (k, c) values (8, -8) if not exists else error;end transaction;
ts2|pid3279|:11644|http://127.108.213.117:30182                                                                                                       ^^^^
2020-02-26 16:17:21,621 (Thread-23) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:875)] Expected exception: Execution Error. Condition on table test_table2_ was not satisfied.
begin transaction  insert into test_table1_ (k, c) values (16, -16) if not exists else error;  insert into test_table2_ (k, c) values (8, -8) if not exists else error;end transaction;
                                                                                                      ^^^^
 (ql error -300)
ts1|pid3169|:31796|http://127.80.20.181:30196 E0226 16:17:21.623581  3476 process_context.cc:180] SQL Error: Execution Error. Condition on table test_table1_ was not satisfied.
ts1|pid3169|:31796|http://127.80.20.181:30196 begin transaction  insert into test_table1_ (k, c) values (41, -41) if not exists else error;  insert into test_table2_ (k, c) values (24, -24) if not exists else error;end transaction;
ts1|pid3169|:31796|http://127.80.20.181:30196                           ^^^^
ts3|pid3347|:24592|http://127.39.135.202:13538 E0226 16:17:21.623788  3479 process_context.cc:180] SQL Error: Execution Error. Condition on table test_table2_ was not satisfied.
ts3|pid3347|:24592|http://127.39.135.202:13538 begin transaction  insert into test_table1_ (k, c) values (12, -12) if not exists else error;  insert into test_table2_ (k, c) values (13, -13) if not exists else error;end transaction;
ts3|pid3347|:24592|http://127.39.135.202:13538                                                                                                       ^^^^
2020-02-26 16:17:21,624 (Thread-20) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:875)] Expected exception: Execution Error. Condition on table test_table1_ was not satisfied.
begin transaction  insert into test_table1_ (k, c) values (41, -41) if not exists else error;  insert into test_table2_ (k, c) values (24, -24) if not exists else error;end transaction;
                          ^^^^
 (ql error -300)
2020-02-26 16:17:21,624 (Thread-22) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:875)] Expected exception: Execution Error. Condition on table test_table2_ was not satisfied.
begin transaction  insert into test_table1_ (k, c) values (12, -12) if not exists else error;  insert into test_table2_ (k, c) values (13, -13) if not exists else error;end transaction;
                                                                                                      ^^^^
 (ql error -300)
ts3|pid3347|:24592|http://127.39.135.202:13538 E0226 16:17:21.628239  3379 process_context.cc:180] SQL Error: Execution Error. Condition on table test_table2_ was not satisfied.
ts3|pid3347|:24592|http://127.39.135.202:13538 begin transaction  insert into test_table1_ (k, c) values (4, -4) if not exists else error;  insert into test_table2_ (k, c) values (13, -13) if not exists else error;end transaction;
ts3|pid3347|:24592|http://127.39.135.202:13538                                                                                                     ^^^^
2020-02-26 16:17:21,628 (Thread-22) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:875)] Expected exception: Execution Error. Condition on table test_table2_ was not satisfied.
begin transaction  insert into test_table1_ (k, c) values (4, -4) if not exists else error;  insert into test_table2_ (k, c) values (13, -13) if not exists else error;end transaction;
                                                                                                    ^^^^
 (ql error -300)
ts1|pid3169|:31796|http://127.80.20.181:30196 I0226 16:17:21.630890  3249 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): 0233bd28-a012-4b63-911b-fa20db497c0d Conflicts with higher priority transaction: 4471120a-d079-428b-bb6e-8e51dbcc0f32 (transaction error 3)
...
ts1|pid3169|:31796|http://127.80.20.181:30196 I0226 16:17:22.685322  3478 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): 4b82a964-8d9a-47bf-8a83-a04f4eda99c8 Conflicts with higher priority transaction: 4471120a-d079-428b-bb6e-8e51dbcc0f32 (transaction error 3)
ts1|pid3169|:31796|http://127.80.20.181:30196 I0226 16:17:22.685463  3478 tablet_rpc.cc:343] Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): Failed Write(tablet: a12f695d2c8d4604b854453b086f1007, num_ops: 1, num_attempts: 1, txn: 4b82a964-8d9a-47bf-8a83-a04f4eda99c8) to tablet a12f695d2c8d4604b854453b086f1007 on tablet server { uuid: fdd7654df9ef4c2c82516878323e7a62 private: [host: "127.80.20.181" port: 31796] cloud_info: placement_cloud: "cloud1" placement_region: "datacenter1" placement_zone: "rack1" after 1 attempt(s): 4b82a964-8d9a-47bf-8a83-a04f4eda99c8 Conflicts with higher priority transaction: 4471120a-d079-428b-bb6e-8e51dbcc0f32 (transaction error 3)
ts1|pid3169|:31796|http://127.80.20.181:30196 I0226 16:17:22.694684  3473 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): 9f022b98-89ec-445d-acc3-4e5d05168a52 Conflicts with higher priority transaction: 4471120a-d079-428b-bb6e-8e51dbcc0f32 (transaction error 3)
...
ts1|pid3169|:31796|http://127.80.20.181:30196 I0226 16:17:23.642856  3478 tablet_service.cc:364] Write failed: Operation failed. Try again. (yb/docdb/conflict_resolution.cc:69): ab4bb7a7-f8bd-4f92-9414-24474e7e7834 Conflicts with higher priority transaction: 4471120a-d079-428b-bb6e-8e51dbcc0f32 (transaction error 3)
2020-02-26 16:17:23,752 (Thread-17) [INFO - org.yb.cql.TestTransaction.lambda$test$2(TestTransaction.java:873)] === Rows[2]: Row[true] | Row[true] |