laurenz / oracle_fdw

PostgreSQL Foreign Data Wrapper for Oracle
http://laurenz.github.io/oracle_fdw/
Other
489 stars 156 forks source link

Slow Performance pulling CLOB from Oracle and writing to Postgres #620

Closed SteveDirschelTR closed 9 months ago

SteveDirschelTR commented 11 months ago
shudslong01a=> select oracle_diag();
                         oracle_diag
-------------------------------------------------------------
 oracle_fdw 2.5.0, PostgreSQL 14.7, Oracle client 21.9.0.0.0
(1 row)

According to documentation:

lob_prefetch (optional, defaults to "1048576")

The size of the CLOB's I am copying are smaller than 1048576 so I have not changed lob_prefetch (below shows max bytes for the CLOB rows I'm copying is 40,657 bytes). I have set these 2 options for table co_session_binding:

ALTER FOREIGN TABLE uds_long_link.co_session_binding OPTIONS ( readonly 'off', prefetch '1000' );

This shows LOB info from the Oracle side for the rows I will test copy over.

SELECT sum(dbms_lob.getlength(session_bindings_clob)) total_bytes, 
       count(*), 
       min(dbms_lob.getlength(session_bindings_clob)) min_bytes, 
       max(dbms_lob.getlength(session_bindings_clob)) max_bytes, 
       round(avg(dbms_lob.getlength(session_bindings_clob))) avg_bytes
  FROM uds_long.co_session_binding
 WHERE CREATED_TIME >= to_date('2020-11-06 00:00:00','YYYY-MM-DD HH24:MI:SS') and CREATED_TIME < to_date('2020-11-06 00:01:00','YYYY-MM-DD HH24:MI:SS');

TOTAL_BYTES   COUNT(*)  MIN_BYTES  MAX_BYTES  AVG_BYTES
----------- ---------- ---------- ---------- ----------
   43179283       2060       3783      40657      20961

Here I run the insert- 5m 24s

shudslong01a=>     INSERT into uds_long.co_session_binding (SESSION_ID, SESSION_BINDINGS_CLOB, CREATED_TIME, LAST_UPDATED_TIME, SESSION_STORAGE_PROFILE)
shudslong01a->             SELECT SESSION_ID, SESSION_BINDINGS_CLOB, CREATED_TIME, LAST_UPDATED_TIME, SESSION_STORAGE_PROFILE
shudslong01a->               FROM uds_long_link.co_session_binding
shudslong01a->              WHERE CREATED_TIME >= '2020-11-06 00:00:00' and CREATED_TIME < '2020-11-06 00:01:00';

INSERT 0 2060
Time: 324750.745 ms (05:24.751)

2020-11-06 data goes into the co_session_binding_created_time_20201107 partition

SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS index
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS table
  FROM (
  SELECT *, total_bytes-index_bytes-coalesce(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS table_name
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r'
  ) a
) a
where table_name = 'co_session_binding_created_time_20201107'
order by 5 desc;

This shows the size of the partition on the Postgres side I just copied the data into. It is interesting that the table, index, toast is only 19.5 MB on the Postgres size when the CLOB data is 41.1 MB. There must be some compression going on.

   oid    | table_schema |                table_name                | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total | index  | toast |  table
----------+--------------+------------------------------------------+--------------+-------------+-------------+-------------+-------------+-------+--------+-------+---------
 60218577 | uds_long     | co_session_binding_created_time_20201107 |         2060 |    20537344 |      188416 |    18046976 |     2301952 | 20 MB | 184 kB | 17 MB | 2248 kB
(1 row)

select sum(length(session_bindings_clob)) from uds_long.co_session_binding_created_time_20201107 ;

   sum
----------
 43179283 

Here is another table I am copying data for. The options for the table are the same as the other table. But this table does not have any CLOBS. Here is a test copy of 1 day of data- 2m 25s

ALTER FOREIGN TABLE uds_long_link.co_session OPTIONS ( readonly 'on', prefetch '1000' );

            INSERT into uds_long.co_session
           (SESSION_ID, ONE_PASS_USER_NAME, FIRST_NAME, LAST_NAME, EMAIL_ADDRESS, EXPIRES_TIME,SITE,
           CREATED_TIME, SESSION_EXPIRES_TIME, PRISM_GUID, SATOKEN, SESSION_ENDED_DATE, SESSION_STATUS,
           LAST_UPDATED_TIME, USER_CLASSIFICATION, ORPHAN_EXPIRES_TIME, EXPIRES_TIME_REASON,
           SESSION_ENDED_REASON, PRODUCT_NAME, EMULATEE_PRISM_GUID, EMULATEE_PRISM_REN_AUTH_TOKEN,
           PRISM_RENEWED_AUTH_TOKEN, TIER, SESSION_BASED_PREFS, SESSION_SOURCE,SERVICE_TYPE,
           PMD_DATA_VERSION, CREATED_TIME_AND_TZ, PAYMENT_TYPE, PRISM_REGISTRATION_KEY, CREATED_TIME_LONG,
           ONE_PASS_PRODUCT, SESSION_STORAGE_PROFILE, PRODUCT_VIEW, IP_ADDRESS,USER_CATEGORY,
           BILLING_METHOD, ROAMING_STATUS, ONE_PASS_USER_ID)
            SELECT SESSION_ID, ONE_PASS_USER_NAME, FIRST_NAME, LAST_NAME, EMAIL_ADDRESS, EXPIRES_TIME,SITE,
           CREATED_TIME, SESSION_EXPIRES_TIME, PRISM_GUID, SATOKEN, SESSION_ENDED_DATE, SESSION_STATUS,
           LAST_UPDATED_TIME, USER_CLASSIFICATION, ORPHAN_EXPIRES_TIME, EXPIRES_TIME_REASON,
           SESSION_ENDED_REASON, PRODUCT_NAME, EMULATEE_PRISM_GUID, EMULATEE_PRISM_REN_AUTH_TOKEN,
           PRISM_RENEWED_AUTH_TOKEN, TIER, SESSION_BASED_PREFS, SESSION_SOURCE,SERVICE_TYPE,
           PMD_DATA_VERSION, CREATED_TIME_AND_TZ, PAYMENT_TYPE, PRISM_REGISTRATION_KEY, CREATED_TIME_LONG,
           ONE_PASS_PRODUCT, SESSION_STORAGE_PROFILE, PRODUCT_VIEW, IP_ADDRESS,USER_CATEGORY,
           BILLING_METHOD, ROAMING_STATUS, ONE_PASS_USER_ID
              FROM uds_long_link.co_session
             WHERE CREATED_TIME >= '2020-12-24 00:00:00' and CREATED_TIME < '2020-12-25 00:00:00';

INSERT 0 1253752
Time: 145716.359 ms (02:25.716)

2020-12-24 data goes into the co_session_created_time_20201225 partition

SELECT *, pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS index
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS table
  FROM (
  SELECT *, total_bytes-index_bytes-coalesce(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS table_name
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r'
  ) a
) a
where table_name = 'co_session_created_time_20201225'
order by 5 desc;

   oid    | table_schema |            table_name            | row_estimate | total_bytes | index_bytes | toast_bytes | table_bytes | total  | index  |   toast    | table
----------+--------------+----------------------------------+--------------+-------------+-------------+-------------+-------------+--------+--------+------------+--------
 60208103 | uds_long     | co_session_created_time_20201225 | 1.253998e+06 |  1043333120 |   381894656 |        8192 |   661430272 | 995 MB | 364 MB | 8192 bytes | 631 MB

So for table CO_SESSION (no CLOB) I can copy over 631 MB of table data in 2m 25s. But for CO_SESSION_BINDING (1 CLOB) it copied over 17 MB of table data + 2.2 MB of toast data = 19.2 MB in 5m 24s.

CO_SESSION = 4,563,113 bytes/second CO_SESSION_BINDING = 62,137 bytes/second

CO_SESSION is 73x faster than CO_SESSION_BINDING. I noted above that the LOB size is larger than the table/toast for this table so I'm not sure how to take that into account when calculating out bytes/second but CO_SESSION_BINDING is significantly slower no matter what numbers you use. What can be done to speed this up? At this pace we will not be able to keep up with copying data for this table- it will be generated at a faster pace than it can be copied (at least single threaded).

Table definitions in Oracle:

SQL> desc uds_long.co_session
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SESSION_ID                                         VARCHAR2(36)
 ONE_PASS_USER_NAME                                 VARCHAR2(100)
 FIRST_NAME                                         VARCHAR2(100)
 LAST_NAME                                          VARCHAR2(100)
 EMAIL_ADDRESS                                      VARCHAR2(250)
 EXPIRES_TIME                                       TIMESTAMP(6)
 SITE                                               VARCHAR2(20)
 CREATED_TIME                                       TIMESTAMP(6)
 SESSION_EXPIRES_TIME                               TIMESTAMP(6)
 PRISM_GUID                                         VARCHAR2(40)
 SATOKEN                                            VARCHAR2(34)
 SESSION_ENDED_DATE                                 TIMESTAMP(6)
 SESSION_STATUS                                     VARCHAR2(20)
 LAST_UPDATED_TIME                                  TIMESTAMP(6)
 USER_CLASSIFICATION                                VARCHAR2(50)
 ORPHAN_EXPIRES_TIME                                TIMESTAMP(6)
 EXPIRES_TIME_REASON                                VARCHAR2(20)
 SESSION_ENDED_REASON                               VARCHAR2(20)
 PRODUCT_NAME                                       VARCHAR2(50)
 EMULATEE_PRISM_GUID                                VARCHAR2(50)
 EMULATEE_PRISM_REN_AUTH_TOKEN                      VARCHAR2(100)
 PRISM_RENEWED_AUTH_TOKEN                           VARCHAR2(100)
 TIER                                               NUMBER(3)
 SESSION_BASED_PREFS                                VARCHAR2(1 CHAR)
 SESSION_SOURCE                                     VARCHAR2(50)
 SERVICE_TYPE                                       VARCHAR2(30)
 PMD_DATA_VERSION                                   VARCHAR2(32)
 CREATED_TIME_AND_TZ                                TIMESTAMP(6) WITH TIME ZONE
 PAYMENT_TYPE                                       VARCHAR2(40 CHAR)
 PRISM_REGISTRATION_KEY                             VARCHAR2(500 CHAR)
 CREATED_TIME_LONG                                  NUMBER(38)
 ONE_PASS_PRODUCT                                   VARCHAR2(30 CHAR)
 SESSION_STORAGE_PROFILE                            VARCHAR2(15 CHAR)
 PRODUCT_VIEW                                       VARCHAR2(50)
 IP_ADDRESS                                         VARCHAR2(50 CHAR)
 USER_CATEGORY                                      VARCHAR2(50 CHAR)
 BILLING_METHOD                                     VARCHAR2(1 CHAR)
 ROAMING_STATUS                                     VARCHAR2(100 CHAR)
 ONE_PASS_USER_ID                                   VARCHAR2(32 CHAR)
 HTO_OFFSET                                         VARCHAR2(10 CHAR)

SQL> desc uds_long.co_session_binding
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SESSION_ID                                         VARCHAR2(36)
 SESSION_BINDINGS_CLOB                              CLOB
 CREATED_TIME                                       TIMESTAMP(6)
 LAST_UPDATED_TIME                                  TIMESTAMP(6)
 SESSION_STORAGE_PROFILE                            VARCHAR2(15 CHAR)

Table definitions in Postgres

shudslong01a=> \d uds_long.co_session
                             Partitioned table "uds_long.co_session"
            Column             |              Type              | Collation | Nullable | Default
-------------------------------+--------------------------------+-----------+----------+---------
 session_id                    | character varying(36)          |           | not null |
 one_pass_user_name            | character varying(100)         |           |          |
 first_name                    | character varying(100)         |           |          |
 last_name                     | character varying(100)         |           |          |
 email_address                 | character varying(250)         |           |          |
 expires_time                  | timestamp(6) without time zone |           | not null |
 site                          | character varying(20)          |           | not null |
 created_time                  | timestamp(6) without time zone |           | not null |
 session_expires_time          | timestamp(6) without time zone |           | not null |
 prism_guid                    | character varying(40)          |           | not null |
 satoken                       | character varying(34)          |           |          |
 session_ended_date            | timestamp(6) without time zone |           |          |
 session_status                | character varying(20)          |           |          |
 last_updated_time             | timestamp(6) without time zone |           |          |
 user_classification           | character varying(50)          |           |          |
 orphan_expires_time           | timestamp(6) without time zone |           |          |
 expires_time_reason           | character varying(20)          |           |          |
 session_ended_reason          | character varying(20)          |           |          |
 product_name                  | character varying(50)          |           |          |
 emulatee_prism_guid           | character varying(50)          |           |          |
 emulatee_prism_ren_auth_token | character varying(100)         |           |          |
 prism_renewed_auth_token      | character varying(100)         |           |          |
 tier                          | numeric(3,0)                   |           |          |
 session_based_prefs           | character varying(1)           |           |          |
 session_source                | character varying(50)          |           |          |
 service_type                  | character varying(30)          |           |          |
 pmd_data_version              | character varying(32)          |           |          |
 created_time_and_tz           | timestamp(6) with time zone    |           |          |
 payment_type                  | character varying(40)          |           |          |
 prism_registration_key        | character varying(500)         |           |          |
 created_time_long             | numeric(38,0)                  |           |          |
 one_pass_product              | character varying(30)          |           |          |
 session_storage_profile       | character varying(15)          |           |          |
 product_view                  | character varying(50)          |           |          |
 ip_address                    | character varying(50)          |           |          |
 user_category                 | character varying(50)          |           |          |
 billing_method                | character varying(1)           |           |          |
 roaming_status                | character varying(100)         |           |          |
 one_pass_user_id              | character varying(32)          |           |          |
 hto_offset                    | character varying(10)          |           |          |
Partition key: RANGE (created_time)
Indexes:
    "pk_session" PRIMARY KEY, btree (session_id, created_time)
    "co_session_ix1" btree (session_status, site, orphan_expires_time, expires_time_reason)
    "co_session_ix2" btree (session_status, prism_guid)
    "co_session_ix3" UNIQUE, btree (session_id, created_time, created_time_long)
Number of partitions: 1097 (Use \d+ to list them.)

shudslong01a=> \d uds_long.co_session_binding
                           Partitioned table "uds_long.co_session_binding"
         Column          |              Type              | Collation | Nullable |      Default
-------------------------+--------------------------------+-----------+----------+-------------------
 session_id              | character varying(36)          |           | not null |
 session_bindings_clob   | text                           |           | not null |
 created_time            | timestamp(6) without time zone |           | not null | clock_timestamp()
 last_updated_time       | timestamp(6) without time zone |           |          |
 session_storage_profile | character varying(15)          |           |          |
Partition key: RANGE (created_time)
Indexes:
    "pk_session_binding" PRIMARY KEY, btree (session_id, created_time)
Number of partitions: 1097 (Use \d+ to list them.)

Let me know if you need additional details on this. Thanks

laurenz commented 11 months ago

You are trying to confuse me by talking about lob_prefetch, which was introduced in 2.6.0. But oracle_diag() shows that you are using version 2.5.0.

Before commit d679a73d02057fd674fc78fe5f25183e5c9a172b, oracle_fdw had a limitation that affected performance for LOBs. This is described in the README of version 2.5.0:

Note that due to an Oracle limitation, row prefetching won't work if the Oracle query involves a BLOB, CLOB or BFILE column. As a consequence, queries on such columns on a foreign table will perform badly if you fetch many rows.

The problem was that Oracle performed a client-server round trip for each table row fetched. That would also explain why the table with smaller LOBs seems to perform worse. If you count the time per table row, they are probably similar.

Use oracle_fdw version 2.6.0, that should improve the performance.

philflorent commented 10 months ago

Hi,

I can confirm that oracle-fdw 2.6+ dramatically improves speed retrieval of lines that include CLOB/LOB columns. It's amazing to see it is now much faster to query an Oracle database from a PostgreSQL database than it is from another Oracle database (simple queries only of course !). It's even easier to query Oracle from PostgreSQL than to query Oracle directly in some cases. A LONG column is a TEXT in the foreign table. Hence you can use text (varchar) functions, something that is impossible directly from Oracle. Funny fact, Oracle knows it is a pain to manipulate LONG columns and they tried to add workarounds in their own dictionary.

Example with "all_constraints" view:

SEARCH_CONDITION LONG

SEARCH_CONDITION_VC VARCHAR2(4000)

You have 2 columns with the same information : a column that includes the complete information but it is difficult to query it OR a column that is easy to query but perhaps it is truncated. With oracle-fdw, search_condition is easy to query AND it is not truncated.

Use 2.6+ version if you do have LOB/CLOBS :) !!!!

Best regards, Phil

philflorent commented 10 months ago

Something I didn't mention but Laurenz you had also observed that I think. oracle_fdw 2.6+ improves read performance of Oracle lines with lob/clob columns. But if you want to store those lines and to improve write speed at Postgres side it's also interesting to try lz4 instead of pglz for toast compression. I see "PostgreSQL 14.7" in the first post and the ability to choose lz4 for toast compression was added via default_toast_compression in this version. If PostgreSQL is compiled with --lz4 feature is available. Best regards, Phil

laurenz commented 9 months ago

Are there any questions left?

SteveDirschelTR commented 9 months ago

No additional questions. Thanks