yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.88k stars 1.05k forks source link

[YSQL][Bulk load] Cancelled non-transactional Copy and shutdown: missing rows #12684

Open def- opened 2 years ago

def- commented 2 years ago

Jira Link: DB-578

Description

Not sure if this is important enough to warrant an issue, but this kind of missing flush could cause other missing data. I'm running current yugabyte-db master state (55c2d15d351d4e8315da2509ac5d0d827ffbfa0c). I created a large CSV file:

#!/usr/bin/env python3
print('a,b,c,d')
for i in range(100000000): # 3.6 GB
    print(f'{i*4},{i*4+1},{i*4+2},{i*4+3}')

And tried reading it into a local RF3 database (macOS, M1):

set yb_disable_transactional_writes=true;
create table t (a integer, b serial, c varchar, d int);
copy t from '/Users/deen/foo.csv' with (format csv, header);

After ~10 minutes I canceled the copy:

yugabyte=# copy t from '/Users/deen/foo.csv' with (format csv, header);
^CCancel request sent
ERROR:  canceling statement due to user request
CONTEXT:  COPY t, line 24622499: "98489988,98489989,98489990,98489991"

I would now expect 24622498 rows in t. When running select count(*) from t; immediately after the count is lower, after about a minute it reaches this number of rows. But if I restart the database before it reaches it, the last 498 rows seem to get lost permanently:

$ bin/yb-ctl --replication_factor 3 stop
$ bin/yb-ctl --replication_factor 3 start
$ bin/ysqlsh
yugabyte=# select count(*) from t;
  count
----------
 24622000
(1 row)

I couldn't always reproduce this, probably depends on how much has to be flushed and timing, but I got it 2 times separately.

def- commented 2 years ago

@pkj415 Could this be related to https://github.com/yugabyte/yugabyte-db/issues/11628 ?

sushantrmishra commented 2 years ago

@def- This can happen with transactions disabled.

When transaction is enabled then transaction control the number of rows persisted.

Though with disabled transaction, each row gets inserted as single row transaction. If the copy gets cancelled abruptly then inflight writes which are already sent to docDB will get persisted. Another layer is that there is buffering in YSQL layer as well, if the buffer is not sent yet to docDB yet then that might get cleaned up as well and will be reflected as lost rows.

def- commented 2 years ago

@sushantrmishra Alright, so this is expected behavior? Can close the bug in that case.