invisible-college / statebus

All aboard the STATEBUS!!!
118 stars 5 forks source link

file_store doesn't scale #28

Closed tkriplean closed 6 years ago

tkriplean commented 7 years ago

Statebus writes its database to file after the cache is dirtied. This write operation blocks the server until completed. Writes can take 1-2 seconds with a 10mb database (nested prototype), and 8-14 seconds for a 80mb database (slideboard). This causes very noticeable lag in, for example, initial page load.

About 40% of the time is spent in stringifying the cache and 60% in writing the string to file. I experimented with using json_stream for a streaming stringify directly to file, but only saw modest performance improvement.

I know there are some ideas for bigger plans for addressing data management, as well as ideas for de-duping the database file...but I wanted to log this issue because it is having a big impact on the apps I'm making :)

Note: I have mitigated this problem a bit by writing the database to disk no more than once per 10s, instead of 100ms (https://github.com/invisible-college/statebus/commit/38381ff35bd37c8c1703ea71863a30200d3ca377).

toomim commented 7 years ago

I made that write frequency a parameter in https://github.com/invisible-college/statebus/commit/1ae6ecc8d966d3134ab14d53383730abaf14166d.

The next steps:

  1. When serializing JSON, first convert nested objects into pointers like {_key: 'bar'}. For example, instead of {key: 'foo', bar: {key: 'bar', baz: 3}}, we would write {key: 'foo', bar: {_key: 'bar'}}. The bar state will get written out separately anyway, and so we want the nested references to other state to just write out a pointer.
  2. Implement a write-ahead log, aka journal. Then we can have really long delays on the full writes, because all changes are getting logged and stored anyway.
  3. Write out the JSON in a custom format that allows random access, like:
    {
    {key: 'foo', bar: {}},                                             // Leave lots of empty space
    {key: 'bar', 3: [1,2,3]}                                           // at ends of lines
    }

When statebus reads in this db, it'd remember the offsets in the file that each piece of state begins and ends at. Then it can mutate the state in place, without rewriting the whole file. By including extra space at the end of each item, they can even expand in-place up to a certain amount.

Also, right now we are prettyprinting the JSON before writing it to file. This makes it bigger than it needs to be.

toomim commented 7 years ago

I implemented (1) in a prototype, but it didn't improve performance very much. I'm thinking of just implementing a postgres_store alternative to file_store, which will have better performance.

edit: actually, a sqlite_store would be a better first step.

toomim commented 6 years ago

We made a sqlite_store