paritytech / parity-db

Experimental blockchain database
Apache License 2.0
263 stars 59 forks source link

Consider switching away from mmap #224

Closed nazar-pc closed 9 months ago

nazar-pc commented 9 months ago

I read Are You Sure You Want to Use MMAP in Your Database Management System? recently and sicne ParityDb seems to be using mmap actively, it is probably worth reconsidering its usage here as well given how few benefits it brings and how many drawbacks it has.

arkpar commented 9 months ago

I'm familiar with this article. mmap-related issue listed there have a comparitevly lower impact for ParityDb, mostly bceause of the use case. ParityDb does not strive to be an all-purpose key-value store. Its main application is the blockchain.

mmap problems listed in the article: Problem #1: Transactional Safety: Not that important because writes are serialized. In a blockchain node there's usually just one thread that needs to write to the database and observing partial transactions is OK (or resolved on the user level) Problem #2: I/O Stalls: The library has a sync API anyway. I've benchmarks io_uring a while ago and it showed to have no benefit when used synchronously. Problem #3: Error Handling: This is handled by the WAL. Problem #4: Performance Issues: This one describes OS-related issues with page tables/TLB. It could be relevant in the future, but with the number or threads and data that we typically have, it does not seem to be an issue.

An early version of ParityDb used pwrite/pread based IO for everything. Switching to mmaps resulted in a considerable boost for our benchmarks.