yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.93k stars 1.06k forks source link

[YSQL] Batching for bulk loads is not occurring in read committed isolation #18566

Open tvesely opened 1 year ago

tvesely commented 1 year ago

Jira Link: DB-7504

Description

Running a bulk load like the following takes as much as 100x longer when yb_enable_read_committed_isolation=true:

CREATE TABLE temp(id int, data text, PRIMARY KEY(id ASC));
DO $do$ BEGIN FOR i IN 1..1000000 LOOP INSERT INTO temp(id, data) VALUES (i, '!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~');END LOOP;END $do$;

This appears to be because batching is not occurring.

With yb_enable_read_committed_isolation=true:

I0804 14:43:48.715556 1153536 pg_session.cc:646] Flushing buffered operations, using transactional session (num ops: 1)
I0804 14:43:48.715968 1153536 pg_session.cc:646] Flushing buffered operations, using transactional session (num ops: 1)
I0804 14:43:48.716580 1153536 pg_session.cc:646] Flushing buffered operations, using transactional session (num ops: 1)

With yb_enable_read_committed_isolation=false:

I0804 14:41:31.394170 1152463 pg_session.cc:646] Flushing buffered operations, using transactional session (num ops: 3072)
I0804 14:41:31.469552 1152463 pg_session.cc:646] Flushing buffered operations, using transactional session (num ops: 3072)
I0804 14:41:31.542363 1152463 pg_session.cc:646] Flushing buffered operations, using transactional session (num ops: 3072)

Warning: Please confirm that this issue does not contain any sensitive information

pkj415 commented 1 year ago

This is because we flush buffered ops and pick a new snapshot for each INSERT in the stored procedure -

/*
     * YB: We have to RESET read point in YSQL for READ COMMITTED isolation level.
     * A read point is analogous to the snapshot in PostgreSQL.
     *
     * We also need to flush all earlier operations so that they complete on the
     * previous snapshot.
     *
     * READ COMMITTED semantics don't apply to DDLs.
     */
    if (IsYBReadCommitted() && YBGetDdlNestingLevel() == 0)
    {
        elog(DEBUG2, "Resetting read point for statement in Read Committed txn");
        HandleYBStatus(YBCPgFlushBufferedOperations());
        HandleYBStatus(YBCPgResetTransactionReadPoint());
    }

This is no easy way to solve this as of now. We will have to brainstorm on possible solutions, but this is longer term.