techfort / skrymir

LokiJS v2.0 experimental repository
MIT License
3 stars 0 forks source link

Persistence and Sync Adapters #1

Open obeliskos opened 9 years ago

obeliskos commented 9 years ago

If sync adapters are added, I can imagine them being used in addition to and separately from persistence adapters.

Persistence adapters (and sync adapters) could be modified to support accepting and returning db reference rather than just string. Could continue to support both by implementing a property such as 'adapterType' or 'adapterLevel' to indicate if loki should pass it db reference or serialized db.

Might be nice to set an arbitrary array of adapters configurable with their own timeouts. In the past I have handled such timing issues with a base resolution timer and multiples of that for each individual. Our current 'autosave timer' might be enhanced into a rough scheduler to orchestrate this. Just an idea in the absence of, and in the direction towards, a real scheduler.

Some have expressed that as our database gets large (say 2 gigs of ram) that persistence might be slow on a database. I wouldn't think it would take that long but in case it does a storage adapter might be written which saves collections independently in separate files and reconstituted on inflation. Not sure we should get into that but something of that granularity would be possible with db reference adapter interface.

Change API might similarly flush the whole database 1 out of say 10 times and use change api to flush smaller changesets. If needed this thread can be used for discussions on that.

techfort commented 9 years ago

Lots of good points. I'm in agreement with the idea that's shaping up about the Persistence Adapters, I like the idea of a scheduler, we may try to go for that from day 1 if possible? Also you raise the point of the structure of the files saved. Two major models emerging at the moment: single file and collection files. In case of really large data sets we could also supply some kind of vertical partitioning mechanism (records 1-1M in collection A1, 2-2M in A2) and have those saved in different files. Other considerations about saving files are the serialization method: msgpack is faster and reduces space, it may not be a standard but if you want to reutilize your data all you have to do is load it again with msgpack. JSON is a standard, but quite verbose. I have developed a few months ago a serializer called jumble, which is only a proof of concept, but it utilized 50% of the space used by JSON and was quite fast, problem is it only works with predefined "maps" (in other words you supply the serialization method for each field) and it doesn't support schema-free (unexpected fields in the objects will throw the serializer off).

seriousme commented 9 years ago

(hope this is the right issue for this) As for the format on disk: Log structured storage offers speed and robustness. (see http://redis.io/topics/persistence for details) See also #6

seriousme commented 9 years ago

Wrt sync adapters: as soon as multiple actors come into play you have to be careful about race conditions. (esp. when using push and splice on multiple arrays) You want to avoid locking if possible so the system must be designed to cater for that.

techfort commented 9 years ago

sorry if this sounds incredibly stupid but with JS being single-threaded the only way to create race conditions would be with async operations, or do I misunderstand? if that was the case we could just enforce synchronicity on any array push/splice.

seriousme commented 9 years ago

That is my understanding as well, as long as you avoid any async in the critical path you should be good. Downside though is that if the critical path takes too much time it will block any other activity in the app, so it will take some balancing.

techfort commented 9 years ago

Also on persistence: I remember thinking of the Changes API as a way to implement something along the lines of the InnoDB binary log, which basically records every single query run on the db to replicate and sync across new nodes. Out-of-date nodes just need a point-in-time reference to know where to pick up the master's log and re-sync. We may think of storing the changes as a form of Log? Maybe i'm digressing.

seriousme commented 9 years ago

What you describe from InnoDB sounds to me like Log append as used by Redis, Couchdb and all the others. (as long as Inno only stores create/update/delete queries, not reads ;-))

techfort commented 9 years ago

ah cool - then we're all the same page I suppose? the cool thing about this is that our Changes API is ambivalent.

On Mon, Mar 30, 2015 at 9:26 PM, seriousme notifications@github.com wrote:

What you describe from InnoDB sounds to me like Log append as used by Redis, Couchdb and all the others. (as long as Inno only stores create/update/delete queries, not reads ;-))

— Reply to this email directly or view it on GitHub https://github.com/techfort/skrymir/issues/1#issuecomment-87821764.

obeliskos commented 9 years ago

Yea I may have read your messages out of order and you might be doing the same but yea this may be another argument for transform/render phased approach to serialization.

The argument I made in another thread was for seriousme node encryption adapter being possibly better suited as a 'transform' phase so it can be used by all storage methods (even though in retrospect only node webkit could possibly use indexeddb or localstorage).

The parallel example relevant here might be for a 'bson' or other transform. Still not certain how that would work with a decomposition strategy... would seem to require a more complex adapter interface so that the adapter could invoke the transform itself.

techfort commented 9 years ago

Yes I get you now :) yes this is definitely a step we need to think of. Persistence Adapters will take core of saving data either to disk, IndexedDB, localStorage or other storage form. Transform Adapters will take care of taking the data and turning it into JSON, BSON, Encrypted JSON or whatever other format we can think of.

So the save process can be JSONTransformAdapter -> LocalStorageAdapter, or EncryptedJSONTransformAdapter -> DiskStorageAdapter (these are only two possible examples) Sounds good?

obeliskos commented 9 years ago

Yea that sounds great

seriousme commented 9 years ago

At the risk of explaining the obvious..

If you want to do ES6, generators could be used as:

If I understand the concepts correctly it enables streaming and lazy evaluation. e.g, say we got 30 users with age > 25

users.find({'age':{'$gt': 25}}).limit(5).data();

In the current setup it would mean digging up 30 ID's and process those. Using lazy evaluation find would spit out 1 record at the time and stop at 5.

Transformations could be part of the chain, with storage adapters at the tail.

Bit of inspiration might be found at: https://github.com/mojo-js/crudlet.js However crudlet does not use generators nor ES6 http://www.2ality.com/2015/03/es6-generators.html Does a better job at explaining how to do this in ES6.

techfort commented 9 years ago

Ok I like the clever use of generators a lot. Now my question is: would you agree that the limit() part of the process lies in the Query engine while data consumption is best modularised in a Dispatcher as per the suggestion i was making in #6 ?

techfort commented 9 years ago

@seriousme @obeliskos as for crudlet, i contacted the guy and he was so kind to churn this thing out : https://github.com/mojo-js/crudlet-loki

Looks cool!

seriousme commented 9 years ago

Smart move :-) Impressive result in a short time frame !

techfort commented 9 years ago

well - it's just a stub, there's no code in it yet, but he said he'd get on it tonight.

techfort commented 9 years ago

That crudlet-loki project is a go. Awesome stuff.