IncrementalIDB - MegaChunking

radex commented 3 years ago

(WIP, please don't review the code yet)

Yes, it's me again, with yet another pull request full of strange, complicated code — and another promise that it's worth it for performance 🙃

I think I'm approaching the limits of what IndexedDB can do performance-wise, but it's important for my use case to squeeze all that's possible out of it ;)

TL;DR: It loads the database 22% faster ;)

I made a picture to explain the problem that this PR is trying to solve:

Captura de pantalla 2020-12-10 a las 10 14 35

IndexedDB is implemented (in all browsers as far as I can tell, but certainly in Chrome and Safari) with a multi-process architecture, and the cross-process communication is not very efficient. This can be seen above - waiting for IDB to fetch data from disk takes relatively little time, and most of the time is spent waiting for the XPC dance to complete transferring data -- and clearly, it's not very well tuned, as the CPU usage in the browser process is very low.

So the goal is to:

layer enough work at the same time that CPU utilization stays high
reduce the initial wait for IDB when no work happens on main thread
take better advantage of the concurrency opportunity, and try to keep both main/browser and IDB processes busy at the same time.

This is what I achieved:

Captura de pantalla 2020-12-10 a las 9 59 41

This achieves 22% improvement on my benchmark, and likely more free performance for apps that didn't opt to manually tune IncrementalIDB by supplying serializeChunk/deserializeChunk.

Instead of calling IDBObjectStore.getAll(), I'm fetching multiple megachunks (chunks of chunks 🙃) - currently 20 requests using adjacent IDBKeyRanges. AFAICT, the IDB process in both Safari and Chrome does the first phase (actual disk/db work) sequentially, so there's no win here, but the XPC is more efficient for some reason. I guess since the IDB process sends more messages to browser process, there are fewer gaps in processing them on browser side, so CPU utilization stays higher.

In a further improvement (I call this megachunk interleaving), I only request first half of the megachunks initially, and then in in onsuccess of each one I request the (i+n/2)th chunk. This reduces the initial wait for IDB to almost nothing, and improves concurrency, as the IDB process is kept busy while JS is processing the first half of its work. (I also moved most of the chunk processing - JSON.parse and optional deserializeChunk from the end of the process much earlier - to each megachunk's onSuccess, so that main and IDB processes can be kept busy at the same time… I think this should also improve GC pressure a little bit, but I haven't yet figured out a good technique for measuring that, since it's very noisy)

I'm almost out of ideas for further improvements for now, and the law of diminishing returns is catching up to me, so it'll probably the last PR in the series for a while...

PS. In case you were wondering about using IDBCursor to maximize concurrency opportunity — I tried that multiple times, and it doesn't work. I tried interleaving multiple IDBCursors, and I got to nearly the same performance as interleaved megachunking, but still slower. There are just too many useless pauses on main thread...

radex commented 3 years ago

@techfort We've been running this internally for a while, found no issues so far

techfort commented 3 years ago

@radex this looks fantastic, i'm merging and sometime today i'll get round to doing a new release. I should really automate this release crap

techfort / LokiJS

IncrementalIDB - MegaChunking #874