paritytech / parity-db

Experimental blockchain database
Apache License 2.0
263 stars 59 forks source link

Memory usage on Windows #239

Open nazar-pc opened 3 months ago

nazar-pc commented 3 months ago

Hi, we've been battling memory usage on Windows at Subspace for a while and the gist of it is that Windows is just horribly bad at not using huge amounts of memory with memory-mapped files or any other files that don't disable buffering completely (I don't think https://github.com/paritytech/parity-db/pull/235 has a major impact).

I tried to convince Windows developers that it makes no sense, but they remain unshakable, see discussion at https://github.com/rust-lang/rust/issues/122473

TL;DR: To get controllable memory usage on Windows the only solution is to throw Windows kernel's file system "smartness" out of the window to the maximum extent possible with FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING and do direct reads manually. I wrote a wrapper for this purpose that I think may deserve to become a library.

arkpar commented 3 months ago

Is this really an issue? Did you observe the software actually running out of memory? To me it seems the difference beween linux and windows is just in a way memory usage is reported to the user. Task manager in windows shows mapped memory as being "used". But this memory is not locked. As soon as there's memory pressure it can be unmapped by the OS reducing the working set.

nazar-pc commented 3 months ago

This is an issue, yes it is possible to run out of memory, which is why we switched from memory-mapped I/O at Subspace. In fact Windows will use memory-mapped I/O even for regular file system operations internally unless buffering is explicitly disabled.

The core issue is that while that memory is hypothetically reclaimable by the kernel, Windows is spectacularly bad at that. So on one hand it is an issue for a user that they see high (often close to 100%, but not quite 100%) memory usage, on the other hand since Windows placed a bunch of useless crap into memory, apps that actually need memory for something useful start to lag and when Windows is not able to reclaim memory fast enough simply crash outright.

It is a bit smaller issue with database due to physical size of it, with multi-TB files we had a really horrible user experience with memory-mapped I/O. But it is still work fixing and moving logic into applicaion domain with proper control over what is stored in memory and for how long.