yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.88k stars 1.05k forks source link

[YSQL] Using several of the array-specific functions/operators causes time-out error and server crash #4296

Closed bllewell closed 4 years ago

bllewell commented 4 years ago

Added 15-May-2020.

I ran each of the testcases in Version 2.1.6.0, MacOS, (downloaded from the Internet). Each ran without error.


Using YB version 2.1.4.0 on MacOS Mojave Version 10.14.6; RF=1, single node.

Summary

When each of these:

array_replace(), the || operator, array_cat(), array_append(), array_prepend(), and array_remove()

is used like this:

create table t(k int primary key, arr some_type[]);
insert into t(k, arr)
values (1, '{...}'::some_type[]);

update t
set arr = <the operation>(arr, some_value)
where k = 1;

then errors like this occur:

ERROR:  Timed out: Write(tablet: 46c5bcc78e2e4e9c8744d0eadb1a47e9, num_ops: 1, num_attempts: 343, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 865401.630s (passed: 60.328s): Network error (yb/util/net/socket.cc:535): recvmsg got EOF from remote (system error 58)
FATAL:  terminating connection due to unexpected postmaster exit

Following this, the server has crashed. We need to do yb-ctl restart before normal service is resumed. All sorts of other errors are possible, as shown in each of the following specific testcases. Not all of them require a restart to recover.

Each of the testcases runs without error on vanilla PG (Version 11.2).

In all cases, it's enough to declare table t without a primary key to make the error disappear. And, when this is done, the outcomes are identical to those that the same scripts produce on vanilla PG.

Testcase no. 1: array_replace() with array of row type values

These tests run withour error:

drop type if exists rt cascade;
drop table if exists t;

create type rt as (f1 int, f2 text);
create table t(k int primary key, arr rt[] not null);
insert into t(k, arr) values (1, '{"(1,a)", "(2,b)"}'::rt[]);
update t set arr = '{"(1,c)", "(2,d)"}'::rt[] where k = 1;
select arr from t where k = 1;

update t set arr[2] = '(2,e)'::rt where k = 1;
select arr from t where k = 1;

But this test consistently errors:

drop type if exists rt cascade;
drop table if exists t;

create type rt as (f1 int, f2 text);
create table t(k int primary key, arr rt[] not null);
insert into t(k, arr) values (1, '{"(1,a)", "(2,b)"}'::rt[]);
update t set arr = array_replace(arr, '(1,c)', '(1,p)') where k = 1;

It produces an error like this:

ERROR:  Timed out: Write(tablet: 79165a810c4f430ca23a9e8fe942029f, num_ops: 1, num_attempts: 342, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 867184.541s (passed: 60.024s): Network error (yb/util/net/socket.cc:535): recvmsg got EOF from remote (system error 58)

And now yb-ctl status shows this:

Node 1: yb-tserver (Stopped), yb-master (pid 63154)  

Testcase no. 2: array_replace() with text[] array

These tests run withour error:

set client_min_messages = warning;
drop table if exists t;

create table t(k int primary key, arr text[]);
insert into t(k, arr) values (1, '{a, b}'::text[]);

update t set arr = '{x, y, z}'::text[] where k = 1;
select arr from t where k = 1;

update t set arr[2] = 'q' where k = 1;
select arr from t where k = 1;

update t set arr = '{a, b}'::text[] where k = 1;
select arr from t where k = 1;

But this test consistently errors.

drop table if exists t;

create table t(k int primary key, arr text[]);
insert into t(k, arr) values (1, '{a, b}'::text[]);

update t set arr = array_replace(arr, 'b', 'p') where k = 1;
select k, arr from t where k = 1;

Sometimes, the update reports this nonsense error:

Query error: array size exceeds the maximum allowed (%d)

And sometimes the update goes silenty and the select causes this:

cache lookup failed for type 1713402723

Or sometimes for type 0.

And sometimes the select causes this:

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Note: Try with this:

array_replace(arr, 'b', 'p')

Here, value of arr together with the value of the second "replace with" parameter mean that array_replace() actually have cannot have an effect. But still we see errors.

Testcase no. 3: the || operator with int[] array

This test consistently fails:

drop table if exists t;

create table t(k int primary key, arr int[]);
insert into t(k, arr)
values (1, '{1, 2, 3, 4}'::int[]);

update t
set arr = arr||'{6, 7}'::int[]
where k = 1;

It cases an error like this:

ERROR:  Timed out: Write(tablet: 71d0e3adf37e4f6ba963ef5b3c9c6204, num_ops: 1, num_attempts: 343, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 869242.944s (passed: 60.179s): Network error (yb/util/net/socket.cc:535): recvmsg got EOF from remote (system error 58)

requiring a restart.

Testcase no. 4: array_cat() with int[] array

This test consistently fails:

drop table if exists t;

create table t(k int primary key, arr int[]);
insert into t(k, arr)
values (1, '{1, 2, 3, 4}'::int[]);

select arr as "old value of arr" from t where k = 1;

update t
set arr = array_cat(arr, '{6, 7}'::int[])
where k = 1;

It cases an error like this:

ERROR:  Timed out: Write(tablet: 32e4c08282ae44a2aaebc34a2f662e33, num_ops: 1, num_attempts: 343, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 869754.253s (passed: 60.196s): Network error (yb/util/net/socket.cc:535): recvmsg got EOF from remote (system error 58)

requiring a restart.

Testcase no. 5: array_append() with int[] array

This test consistently fails:

drop table if exists t;

create table t(k int primary key, arr int[]);
insert into t(k, arr)
values (1, '{1, 2, 3, 4}'::int[]);

update t
set arr = array_append(arr, 6::int)
where k = 1;

It cases an error like this:

Timed out: Write(tablet: 24fd1b50195d42f89e3eccfa6b07025b, num_ops: 1, num_attempts: 343, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 869930.572s (passed: 60.340s): Network error (yb/util/net/socket.cc:535): recvmsg got EOF from remote (system error 58)
FATAL:  terminating connection due to unexpected postmaster exit

requiring a restart.

Testcase no. 6: array_prepend() with int[] array

This test consistently fails:

drop table if exists t;

create table t(k int primary key, arr int[]);
insert into t(k, arr)
values (1, '{1, 2, 3, 4}'::int[]);

update t
set arr = array_prepend(0::int, arr)
where k = 1;

It cases an error like this:

Timed out: Write(tablet: 4d9ba6709a6641aaaab504e8c743336b, num_ops: 1, num_attempts: 342, txn: 00000000-0000-0000-0000-000000000000) passed its deadline 870255.545s (passed: 60.017s): Network error (yb/util/net/socket.cc:535): recvmsg got EOF from remote (system error 58)

requiring a restart.

Once or twice, I've seen this different error immediately following the update attempt:

Query error: invalid typLen: %d

Testcase no. 7: array_remove() with int[] array

This test fails with different outcomes:

drop table if exists t;

create table t(k int primary key, arr int[]);
insert into t(k, arr)
values (1, '{1, 2, 3, 4, 5, 6}'::int[]);

update t
set arr = array_remove(arr, 4)
where k = 1;

select arr from t where k = 1;

Usually, I see this:

cache lookup failed for type 0

But, once or twice, I've seen this:

ysqlsh:array_remove.sql:16: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
ysqlsh:array_remove.sql:16: connection to server was lost

This doesn't crash the server.

frozenspider commented 4 years ago

Very nice and thorough description @bllewell ! Crash stacktrace I got when trying case number 3:

TRAP: FailedAssertion("!(cacheId >= 0 && cacheId < (USERMAPPINGUSERSERVER + 1) && ((const void*)(SysCache[cacheId]) != ((void*)0)))", File: "../../../../../../../src/postgres/src/backend/utils/cache/syscache.c", Line: 1463)
*** Aborted at 1587722423 (unix time) try "date -d @1587722423" if you are using GNU date ***
PC: @     0x7fff5eda32c6 __pthread_kill
*** SIGABRT (@0x7fff5eda32c6) received by PID 5508 (TID 0x7000041f3000) stack trace: ***
    @     0x7fff5ee4db5d _sigtramp
    @         0xffffffff (unknown)
    @     0x7fff5ed0d6a6 abort
    @        0x10f117a0e ExceptionalCondition
    @        0x10f112d17 SearchSysCache1
    @        0x10f0488bd format_type_extended
    @        0x10f02a6f0 array_cat
    @        0x10f166347 evalExpr
    @        0x10f16619d YbgEvalExpr
    @        0x10e21d3bc  yb::docdb::DocPgEvalExpr()
    @        0x10e251737  yb::docdb::DocExprExecutor::EvalTSCall()
    @        0x1141f4d3c  yb::QLExprExecutor::EvalExpr()
    @        0x10e251a9d  yb::QLExprExecutor::EvalExpr()
    @        0x10e298d01  yb::docdb::PgsqlWriteOperation::ApplyUpdate()
    @        0x10e298023  yb::docdb::PgsqlWriteOperation::Apply()
    @        0x10e204199  yb::docdb::ExecuteDocWriteOperation()
    @        0x10cd2c8d8  yb::tablet::Tablet::StartDocWriteOperation()
    @        0x10cd313c6  yb::tablet::Tablet::KeyValueBatchFromPgsqlWriteBatch()
    @        0x10cd31bac  yb::tablet::Tablet::AcquireLocksAndPerformDocOperations()
    @        0x10cda70ab  yb::tablet::TabletPeer::WriteAsync()
    @        0x10bf4da2e  yb::tserver::TabletServiceImpl::Write()
    @        0x1104c2994  yb::tserver::TabletServerServiceIf::Handle()
    @        0x1138b23f3  yb::rpc::ServicePoolImpl::Handle()
    @        0x11380a6a9  yb::rpc::InboundCall::InboundCallTask::Run()
    @        0x1138cdeff  yb::rpc::(anonymous namespace)::Worker::Execute()
    @        0x114f3a783  yb::Thread::SuperviseThread()
    @     0x7fff5ee562eb _pthread_body
    @     0x7fff5ee59249 _pthread_start
    @     0x7fff5ee5540d thread_start

As such, this is likely a bug in expression pushdown framework (#3297)

bllewell commented 4 years ago

Workarounds

Here is a trivial workaround for the || operator. It capitalizes on what "Testcase No, 1" shows.

drop table if exists t cascade;
create table t(k int primary key, arr int[]);
insert into t(k, arr)
values (1, '{3, 4, 5}'::int[]);

select arr as "old value of arr" from t where k = 1;

with v as (
  select '{1, 2}'::int[]||arr||6::int as new_arr from t where k = 1)
update t
set arr = (select new_arr from v)
where k = 1;

select arr as "new value of arr" from t where k = 1;

Notice that table t does have a primary key constraint. I ran this ten times in rapid succession. It produced the expected result, first this:

 old value of arr 
------------------
 {3,4,5}

and then this:

 new value of arr 
------------------
 {1,2,3,4,5,6}

every time.

The same workaround is effective for array_remove():

create table t(k int primary key, arr int[]);
insert into t(k, arr)
values (1, '{1, 2, 2, 2, 5, 6}'::int[]);

select arr as "old value of arr" from t where k = 1;

with v as (
  select array_remove(arr, 2) as new_arr from t where k = 1)
update t
set arr = (select new_arr from v)
where k = 1;

select arr as "new value of arr" from t where k = 1;

Notice that, here too, table t does have a primary key constraint. I ran this ten times in rapid succession. It produced the expected result, first this:

 old value of arr 
------------------
 {1,2,2,2,5,6}

and then this:

 new value of arr 
------------------
 {1,5,6}

every time.

It works, too, for array_replace:

drop table if exists t cascade;
drop type if exists rt cascade;
create type rt as (f1 int, f2 text);
create table t(k int primary key, arr rt[]);
insert into t(k, arr)
values (1, '{"(1,rabbit)","(2,hare)","(3,squirrel)","(4,horse)"}'::rt[]);

select arr as "old value of arr" from t where k = 1;

with v as (
  select array_replace(arr, '(3,squirrel)', '(3,bobcat)')
  as new_arr from t where k = 1)
update t
set arr = (select new_arr from v)
where k = 1;

select arr as "new value of arr" from t where k = 1;

Notice that, here too, table t does have a primary key constraint. I ran this ten times in rapid succession. It produced the expected result, first this:

                   old value of arr                   
------------------------------------------------------
 {"(1,rabbit)","(2,hare)","(3,squirrel)","(4,horse)"}

and then this:

                  new value of arr                  
----------------------------------------------------
 {"(1,rabbit)","(2,hare)","(3,bobcat)","(4,horse)"}

every time.

m-iancu commented 4 years ago

@bllewell Should be fixed by 7ec4a77. I tested all examples above and they all work now.