Question: is LokiJS a non-blocking db?

techfort / LokiJS

javascript embeddable / in-memory database

http:/techfort.github.io/LokiJS

MIT License

6.75k stars 481 forks source link

Question: is LokiJS a non-blocking db? #603

Closed KaiSchwarz-cnic closed 7 years ago

KaiSchwarz-cnic commented 7 years ago

In one of our frontend projects we are using SQLite3, which I personally dislike in favor of LokiJS. But I need to clarify things first before switching.

Our main issue with SQLite3 is for now that mutiple write access (by multiple threads) leads to db crashes on our side. How does LokiJS handle this? Could there also be the case of multiple concurrent db savings and some of these requests could not get processed (thus potential data loss)? Or do you have suggestions for a LokiJS setup / db configuration which wouldn't run into these issues?

TIA for any help here.

techfort commented 7 years ago

@papakai i've been out of the loop for a while (lokijs is currently maintained by the valiant effort of a number of developers, namely @obeliskos, @Viatorus et al.), however, as I understand it, LokiJS throttles automatic saves when these would cause overlaps with a save currently in progress. This prevents data loss at the cost of less frequent saves. Up to you to adjust the rate of save depending on how much data you need to store. The other devs might chime in to clarify further.

KaiSchwarz-cnic commented 7 years ago

@techfort Many thanks, looking forward for further clarification. My feedback up to now: LokiJS is really amazing and a great work. Just the documentation is a little pain - things are spreaded into multiple references, at least I had to google a lot around to find what I need and to recognize where you've published all the useful things. I even wasn't aware of rawgit.com which was often very helpful.

obeliskos commented 7 years ago

Hey @papakai and thanks @techfort .

You weren't too specific with your specific environment architecture so maybe you can offer more info as I am not clear on where the multithreading is occurring and how they would call into a node vm where loki would be hosted.

Spawning multiple node processes and spinning up a loki database (copy) in each would not be advisable... you wouldn't want them competing for the same persistence files. If that is your scenario you might be able to layer a rest api on top of a single lokijs instance. You spoke of multithreaded and 'front-end' so i do not know if the concept of a rest server fits in, but you would need a single orchestrator 'service' to service all threads (at least with our out-of-the-box functionality). There are a various few experimental projects out there we might be able to point you to if so.

So current out-of-the-box ideal configuration would be :

To run a single lokijs database instance shared by all consumers of that data
To use autosave/autoload at intervals high enough that disk utilitization is low (varies depending on how large database grows to).
You do not compete with autosave timers by adding your own calls to loadDatabase or saveDatabase

@Techfort is right we do support load and save throttling to improve situations where either manual loading and saving could overlap, or your autosave interval is too low and overlaps.

KaiSchwarz-cnic commented 7 years ago

@obeliskos Thanks for this detailed explanation. Well to provide more details on our needs: We have a Service running which caches configuration data of our backend. These rows have a TTL and get refreshed periodically through a node.js script running in background. We have a node.js http server running which can be accessed by specific routes and these routes are making db accesses. Still it might happen that a request comes in and stumbles over an expired data row, which gets then refreshed and returned. So that's basically where we could run into multiple concurrent write access to the db file. In addition we use PM2 as process manager which starts our http server script on all cores which again leads to some concurrency.

So from 1st reading I expect that your out-of-the-box config could fit our needs there. To go into more detail: How can a single lokijs db instance be shared to all consumers? Am I right, that this would need then be wrapped by http / tcp or is it enough to simply start a node.js script which instantiates the db and provides a http server which allows data processing through appropriate routes. What I mean: Would we have to wrap the lokijs db in addition with its own http / tcp wrapper (I just read that such a wrapper is somehow available as plugin?) or is the described node.js http server script which instantiates a lokijs db at start enough? What is the best practice for horizontal and vertical scaling? How to cover the concurrency which results out of our pm2 setup?

I am also open to complete rewrite our architecture, if required. If you have other suggestions than LokiJS for our purpose, no problem - just point us then in another direction.

obeliskos commented 7 years ago

We have a Service running which caches configuration data of our backend. In addition we use PM2 as process manager which starts our http server script on all cores which again leads to some concurrency

So I still have no idea what you are doing and barely what to suggest 😄

Loki js is an in-memory database primarily so when you 'add a record' we don't necessarily save right away, usually on intervals. Unlike SqlLite which makes many smaller disk reads and writes, lokijs would have everything it needs in memory and just swapping in (and possibly out) in large chunks less frequently, at an interval that makes sense for your solution and persisted 'freshness' needs. If you expect multiple consumers of persisted data you may want to use mongo or for loki you might force everyone to use a common rest api running in a node/(express?) vm which does have sole ownership of the (in-memory) database and its persisted files (basically backups of memory db).

Are you expecting to use lokijs as a read only cache or do you expect to write to it with changes other than 'refreshing' from back end db? Do you even need to save to disk at all or just leave as high perf in-memory cache?

You might also use lokijs as an (in-memory only) local cache within each web server instance. The hits would be different across each server but you might be able to use that in addition to centralized cache service. In full js stack solutions this is where it might get weird as you could have client side browser caching data from webserver lokijs cache, which in turn might be caching from dedicated 'official' lokijs rest server (or other protocol) cache.

KaiSchwarz-cnic commented 7 years ago

ok, I see. I think I understood the common possibilities of LokiJS.

Could you please advice / give more details on:

horizontal scaling possibilities of lokijs (having it on multiple machines / cores running) - how to keep these databases in sync?
in case of having lokijs running on two sides: browser-side AND server-side, how to keep these databases in sync?

In general I could also think of a mixture of LokiJS in browser and MongoDB on server-side, just need to transform the data eventually.

Concerning your question

the consumers of our DB Service request data only, but in case a data row has reached the TTL and thus expired, the consumer also refreshs that row in DB
And a periodically running background job doing a refresh - I could even think about modifying it to always run again at time of the next TTL instead of a fixed period, so that a consumer may never run into a scenario of expired data (is for sure also worth to think about)

obeliskos commented 7 years ago

Lokijs doesn't really have change 'contention' algorithm. We do have Changes API 'notification' though. What you do with those changes is currently your responsibility to re-integrate into those changes into another lokijs or server db instance. You might also just use a query on some lastUpdated column to determine changes.

My initial thoughts based on my understanding is that for pure loki solution you would probably need 'master' lokijs. You can either have all of your web servers interface with this main lokijs cache service (via http on loopback address?) or you can keep clones of the database in each vm and have cache service on both sides keeping the clone refreshed, signaling server to update expired data, etc. Cache ejection might be done in all instances/clones since ttl is ttl that should need no coordination.

So ChangesAPI at the moment appears to be 'single-set', perhaps this could easily be modified to multiple named sets (@techfort might advise if you are interested in pursuing that route). In theory (in a cloned instance architecture) each cloned instance might register with main instance a changeset name (like a server id) and periodically (in-process / on interval) get changes for that server. So as server delta between each cloned node increases, it might keep those individual delta-sets in memory and when clones ask to get refreshed you just provide that changeset, and clear changes for next time.

I would personally go with single loki rest service on loopback address at least to profile/benchmark. If it turns out that http requests are too slow you might shift to cloned instances. Lokijs is mostly a synchronous javascript library running on a single javascript thread, so any blocking will be happening on i/o based on access to that thread. If all of your http servers purpose is to wait on that main thread you might go the clone route. Without benchmarking the 'single instance' solution (which you may have already done) under your expected load this is conjecture, so let me know if you get to real benchmarks and we might course correct where appropriate.

KaiSchwarz-cnic commented 7 years ago

Thanks @obeliskos for these details. I've forgotten to mention that we have our service running locally on each web-server node of our app (as http service). We have no synchronization realized with sqlite3 for now as we have that background job which polls our backend for changes based on the mentioned TTL. this works, but my target is to improve here in future and keeping the services db in sync leads to less backend requests, which is always a better way. I've added this discussion into our project documentation for future reuse. I think migrating to LokiJS will be of interest for us, time will come. I come back to this issue later - but closing it for now. Thanks again for your time - very appreciated!