zodb / relstorage

A backend for ZODB that stores pickles in a relational database.
Other
55 stars 46 forks source link

Large PostgreSQL size after packing and vacuuming #395

Closed skurfer closed 4 years ago

skurfer commented 4 years ago

We have a client on RelStorage 3.0.1 (backed by PostgreSQL 9.6). They were concerned about the size of their database. It seemed too large, even though they’re packing the data weekly. I’ve been looking into it and it seems like RelStorage is leaving a huge amount of unused data behind somewhere.

Data Size
PostgreSQL database 12 GB
Data.fs after conversion 12 GB
Data.fs after packing (0 days) 2.1 GB
PostgreSQL database after conversion back to RelStorage 2421 MB

Here’s an even stranger example (which re-enforces my thinking that something is going wrong in RelStorage):

Data Size
PostgreSQL database 12 GB
PostgreSQL database after packing (0 days) 24 GB
PostgreSQL database after vacuum & full vacuum 10109 MB
Data.fs after conversion 2.1 GB
PostgreSQL database after conversion back to RelStorage 2421 MB

I’m not too concerned about the doubling of data immediately after packing, but notice that converting a 10 GB database to file storage results in only 2.1 GB of data, so packing is removing the same data whether it’s done against file storage or RelStorage but we never see a corresponding drop in the size of the database. Also note that converting back to RelStorage doesn’t result in anything near 10 GB.

Is this a bug, or am I missing something? (I should mention that I haven’t done this dance with any other ZODBs and maybe you find this everywhere, but a 76% reduction by converting to file storage and back seems like a problem.)

Thanks.

jamadden commented 4 years ago

When you pack RelStorage, it collects the necessary information about references between objects, and stores that in database tables object_ref and object_refs_added. For objects that are removed because of the pack, the data in those tables is also removed, but for all objects that survive the pack, the data remains. This makes subsequent packs (substantially) faster because they don't have to rediscover references for objects that haven't changed.

In a history-free database, it's not unusual for the data needed here to rival the size of the objects themselves.

relname pg_size_pretty
object_state 13 GB
object_ref 4025 MB
object_ref_pkey 1516 MB
object_refs_added 1898 MB
object_refs_added_pkey 639 MB

I would guess (but haven't checked) that history-preserving databases result in larger auxiliary tables, simply because they have more states to examine.

You can truncate those tables when the pack is done, at the cost have making the next pack take longer. If the database is largely objects that change and few unchanging objects, then saving the data doesn't help too much anyway.

I could see adding options to either zodbpack or the RelStorage configuration to cause a pack operation to truncate those tables automatically.