yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[YSQL] INTERNAL ERROR Sending too long RPC message during query execution with parallel queries and large row values #24627

Open qvad opened 1 week ago

qvad commented 1 week ago

Jira Link: DB-13684

Description

Occurred during our query execution testing with parallel queries enabled. DDLs:

create table t100000w (c1 int, c2 int not null, c3 int, c4 int, c5 int, c6 int, v char(8192), primary key (c1 asc));
create unique index t100000w_c2 on t100000w (c2 asc);
create index t100000w_c3 on t100000w (c3 asc);
create index t100000w_c4 on t100000w (c4 asc);
create index t100000w_c6 on t100000w (c6 asc);
create index t100000w_c2_c4 on t100000w (c2 asc) include (c4);
create index t100000w_c2_c4v on t100000w (c2 asc) include (c4, v);

select setseed(0.222);
insert into t100000w
  select i, i2,
         nullif((i3+4)/5, (100000+4)/5),
         nullif((i4+9)/10, (100000+9)/10),
         i5,
         nullif((i6+99)/100, (100000+99)/100),
         lpad(sha512(i5::text::bytea)||sha512(i::text::bytea)::text, 8192, '-')
    from (
      select i,
          row_number() over (order by random()) i2,
          row_number() over (order by random() + 1) i3,
          row_number() over (order by random() + 2) i4,
          row_number() over (order by random() + 3) i5,
          row_number() over (order by random() + 4) i6
        from generate_series(1, 100000) i
    ) v order by 1;

Failed query:

15:11:25  Message: 'UNSTABLE: /*+  IndexScan(t100000w) */ select c2, c4, v from t100000w where c4 >= (100000 / 10) * 0 / 1 + 1 and (c4*2 - c4) >= (100000 / 10) * 0 / 1 + 1'
15:11:25  Arguments: (InternalError_('Sending too long RPC message (267390940 bytes of data)\n'),)
15:11:25  2024-10-22 22:11:22,681:ERROR: INTERNAL ERROR Sending too long RPC message (267390940 bytes of data)

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

andrei-mart commented 1 week ago

The scan is by the index, which has short rows, hence regular 1MB parallel range has a lot of rows. The rows in the main table are wide, that's why the resulting response from a parallel range is too big.