zodb / relstorage

A backend for ZODB that stores pickles in a relational database.
Other
54 stars 46 forks source link

[oracle] zodbconvert incredibly slow. Here is a fix. #450

Open NicolasGoeddel opened 3 years ago

NicolasGoeddel commented 3 years ago

Hi there,

for reference you can find the whole issue here: https://community.plone.org/t/zodbconvert-incredibly-slow-solved/13606

While converting a filestorage/blobstorage to a relstorage using the oracle backend it takes a lot if time even if the database is very small.

I found the culprit in this SQL statement:

SELECT
    zoid,
    tid,
    prev_tid,
    current_object_state.state
FROM
    temp_store
        JOIN current_object_state
            USING (zoid)
WHERE (tid <> prev_tid)

It joins with the view current_object_state which in turn joins the two tables current_object and object_state using the columns ZOID and `TID':

CREATE OR REPLACE FORCE EDITIONABLE VIEW "CURRENT_OBJECT_STATE" (
    "ZOID",
    "TID",
    "STATE"
) AS 
    SELECT
        zoid,
        tid,
        state
    FROM object_state
        JOIN current_object
            USING (zoid, tid);

The table object_state already has an unique index over (ZOID, TID) but current_object has not. So I created the index by myself:

CREATE UNIQUE INDEX "CURRENT_OBJECT_UI" ON "CURRENT_OBJECT" ("ZOID", "TID")

After that the SELECT statement I mentioned above only needs 50 ms instead of 35 s to evaluate. And also the zodbconvert script got 36 times faster than before. I was able to convert the data with 1.14 transactions per second:

2021-03-22 15:48:38,689 [relstorage.storage.copy] INFO   Copied transactions: 955/955,100.00%, 63.36 KB/s   1.14 True/s, 51.62 MB

And before it was more like 0.03 transactions per second.

Please add that index to the code base so everyone can feel the difference.

Version:

NicolasGoeddel commented 3 years ago

I think one could add the index right after this line if I am right: https://github.com/zodb/relstorage/blob/9e7a63801a550eada0b70b20a0ae5007f48ccd1c/src/relstorage/adapters/schema.py#L482

jamadden commented 3 years ago

Oracle is not tested or supported. I will not be adding an index that only benefits oracle.

Now, that could all change if there was some way to test oracle on ci. Given that there’s not, I am more likely to just remove the code.

jamadden commented 3 years ago

More generally, as every DBA knows, indices are a trade off in several dimensions, chiefly of interest here is online performance versus batch performance. Since RelStorage is focused on online performance, and converting with zodbconvert is extremely rare, relatively speaking, the bar is very high to add features that only benefit such a rare use.

NicolasGoeddel commented 3 years ago

There is also a guy who saw a similar culprit with Postgres. I did not test it but I wonder if it is the same reason but for restore. https://community.plone.org/t/zodbconvert-filestorage-relstorage-much-slower-than-other-direction/13172

jamadden commented 3 years ago

The version of RelStorage being used there was extremely old.

My group just converted a bunch of large-ish databases to Postgres using the current version of RelStorage and it actually went much faster than expected.

NicolasGoeddel commented 3 years ago

Thanks for your answer. May I ask you one more thing? What is the last version of RelStorage that was tested against an Oracle database server? I saw something in the Changelog to version 3.0a13. Does that mean version 3.0 is safe to use?

jamadden commented 3 years ago

3.0a13 was probably the last time I tested it specifically. Oracle has never been regularly tested to my knowledge.

NicolasGoeddel commented 3 years ago

Okay. And that all depends on the lack of an oracle database server where tests could be ran against? I never ran tests on Github but shouldn't this docker image help you with that? https://github.com/oracle/docker-images/tree/main/OracleDatabase Before I can really help I would need to learn about writing a CI pipeline for Github and I guess I need to simulate it first on my computer to be able to debug it.

tobiasherp commented 3 years ago

There is also a guy who saw a similar culprit with Postgres. I did not test it but I wonder if it is the same reason but for restore. https://community.plone.org/t/zodbconvert-filestorage-relstorage-much-slower-than-other-direction/13172

I'm that guy ;-) It's true that both my PostgreSQL and RelStorage versions are quite old, but the performance difference between the RelDB-to-Data.fs and the Data.fs-to-RelDB conversion is a factor of 200, which makes the latter almost completely useless for backup/restore purposes. It might be an option to add the index just for this purpose, before performing the conversion; so, nice to know.

But I'm not convinced by that "... only benefits oracle." argument. I'd be interested in some little tests, performing the conversion in both directions, with and without the index, with newer versions of the package. Is there really a considerable performance penalty while writing to the database? (Of course it will take time to maintain the additional index, but this might be over-compensated by performance gains while reading, or it might be neglectably small.)

mamico commented 3 years ago

What is the last version of RelStorage that was tested against an Oracle database server?

@NicolasGoeddel I've been using Relstorage 3.4.0 and Oracle in the wild for months with no particular problems. But actually moving to Postgresql is part of my medium-term plans.