On ZODB's scalability - Githubissues

TobiasHT5 commented 1 year ago

There's a part in the documentation which writes that ZODB doesn't scale as well as it should. I'd like to know some reasons why it is so or if there's a report around this topic that I could read through.

d-maurer commented 1 year ago

TobiasHT wrote at 2022-12-11 20:03 -0800:

There's a part in the documentation which writes that ZODB doesn't scale as well as it should. I'd like to know some reasons why it is so or if there's a report around this topic that I could read through.

Unlike (e.g.) relational databases, the ZODB does not distinguish between a schema (describing the types) and the data (describing the values); instead, each value is individually typed. This gives you higher redundancy and reduced performance for scenarios with a large part of structured data.

As far as I know, there is currently no storage which supports distributed writing and transaction commits are globally serialized, even if they operate on disjunct object sets. Other database systems use finer grained concurrency controls and can provide better write scalability.

There are solutions to replicate storages for reading purposes. Thus, if your application does not need high frequent writes, you might be satisfied with the ZODB scalability.

jimfulton commented 1 year ago

There's a part in the documentation which writes that ZODB doesn't scale as well as it should.

Where did you see that?

TobiasHT5 commented 1 year ago

Where did you see that?

@jimfulton here

jimfulton commented 1 year ago

There's a part in the documentation which writes that ZODB doesn't scale as well as it should.

So that documentation doesn't use the word "should" to describe ZODB's scalability, or lack thereof.

Databases designs and implementations make tradeoffs based on uses and on effort and rewards of developers.

For example, big-data analytical systems tend to perform poorly for typical transactional database tasks.

ZODB was designed mainly to make transactional persistence as transparent as possible. This is a goal typical of object-oriented databases, typically expresses as reducing the impedance mismatch between databases and programming.

This has had a happy side effect of providing transactional cache invalidation, solving one of the two hard problems in computer science. If you have read-heavy applications, where individual clients have working sets that mostly fit in memory, then ZODB can perform very very well.

This in turns imposes a challenge for write throughput. Every transaction is assigned a monotonically increasing ID and database servers have to serialize assignment of these ids. Current servers may serialize much more of the commit process, but all that has to be serialized in the assignment of transaction ids. This is a core architectural tradeoff in ZODB. Having said that, ZODB can still commit thousands of transactions per second, depending on server configuration.

Another challenge for ZODB is search. Typically (effectively always) search is done in client code. There are indexing data structures, but these are used and must be loaded to clients, which can balloon the size of the working set and require more trips to the database server. There have been attempts to move more search to the server, most notably newtdb, and there may be more in the future.

jimfulton commented 1 year ago

BTW, my response doesn't contradict @d-maurer 's excellent response. It's just another take.

TobiasHT5 commented 1 year ago

Thanks, I think I've learnt a few things from this

zopefoundation / ZODB

On ZODB's scalability #374