yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.74k stars 1.05k forks source link

[YSQL] Wait queues are not enabled error on upgrade path from <=2.16 to >=2024.1 #23208

Closed archit-rastogi closed 1 month ago

archit-rastogi commented 1 month ago

Jira Link: DB-12152

Description

2024-05-30 18:05:49.168 UTC [97063] ERROR:  Wait queues are not enabled
2024-05-30 18:05:49.168 UTC [97063] STATEMENT:  SELECT * FROM upgrade_tab_2_16_9_0_0 WHERE  age < 24 ORDER BY age ASC FOR UPDATE
W0530 18:05:49.167925 93779 tablet_rpc.cc:464] Not implemented (yb/docdb/conflict_resolution.cc:1220): Failed Read(tablet: 9389b1c1ccd7434190ae79be3fd8f4e6, num_ops: 1, num_attempts: 1, txn: e3b3a818-b2a1-4352-b7d2-3ea920dcfadf, subtxn: [none]) to tablet 9389b1c1ccd7434190ae79be3fd8f4e6 on tablet server { uuid: 73dae0b689284d0cb58a810579b30644 private: [host: "10.9.107.250" port: 9100] cloud_info: placement_cloud: "aws" placement_region: "us-west-2" placement_zone: "us-west-2a" after 1 attempt(s): Wait queues are not enabled

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

archit-rastogi commented 1 month ago

on 2.16 -> 2024.1 upgrade path, the tserver launching the read request (on newer version) sets the wait policy to wait on conflict, but the tserver handling the read (on older version) doesn't handle this well, and errors with the above message. On 2.16

auto conflict_management_policy = wait_queue ? docdb::WAIT_ON_CONFLICT : docdb::FAIL_ON_CONFLICT; // rightly checks
  const auto& pairs = write_batch.read_pairs();
  if (!pairs.empty() && write_batch.has_wait_policy()) {
    switch (write_batch.wait_policy()) {
      case WAIT_BLOCK:
        conflict_management_policy = docdb::WAIT_ON_CONFLICT;    // overwrites without checking if wait queues are enabled
        break;                                                   // and hence leading to the error.
      ...
    }
  }

so upgrades from <= 2.16 to >= 2024.1 might face this issue.

credits: @basavaraj29

rthallamko3 commented 1 month ago

This error is expected during upgrade from wait-on-conflict off (< 2024.1 release) to wait-on-conflict on (>=2024.1 release). The test can add this as an expected failure.