Open seriousme opened 9 years ago
took me a while to digest it all but this seems like a very good start! My first question is if we should have (similarly to the Flux architecture) a Dispatcher component that digests internal collection events and publishes them. In other words, I'd like to decouple consumers from the Collection, possibly DVs are also going to be consumers?
Secondly, when you say that Collections answer simple queries, do you mean the query engine does a "pass through" to Collection to return a particular record?
Another element I would like discussed is caching, a component that works out frequent queries. In JS it is very easy to do caching:
function process(arg) {
// some complex stuff
}
function processCached(arg) {
var cache = {};
return function () {
if (cache[arg]) {
return cache[arg]
} else {
cache[arg] = process(arg);
return cache[arg];
}
}
}
the only adaptation would be to have a dirty flag on the collection data to flush all cache. opinions, suggestions and lapidation all accepted :)
I refrain from lapidation contests with Irishmen, too much quarry :)
Is a 'database' or orchestration level entity implied above concerning logs/adapters/syncs? Each collection typically has been isolated islands with little I/o obligations other than serialization. Since this is still early i'm not certain if this is intentional or not.
An event dispatcher would be valuable imho as it can basically glues the logic together and its a well known paradigm. Otoh you could also code the event dispatch by hand and get even more perf. E.g: say you have an update. You could then: a) have an eventdispatcher which distributes the events to the various adapters or b) just write some code to do the update, take the result, call the adapter etc..
option a) is more clean (config over code) and easier to understand for non-insiders, option b) however is imho typically faster (no dispatcher overhead ;-))
I would only use cache if I really had to as it tends to complicate stuff.
Nearly forgot ;-).
with simple queries I meant:
The query engine would resolve more complex logic like "(&(name==joe)(age>25)) Here sits the smarts that decides:
Btw: It might also be convenient to define a list of principles/tools to use: e.g.:
Principles:
Tools:
(I've just used my imagination, you might want to list something totally different here ;-))
I agree with all of it (including the choice of framework, i'm a gulp/karma/istanbul fan) except for (partly) ES6. I find the whole node.js using unsupported versions of v8 and having to use flags for ES6 support completely stupid. In that respect i'm happy io.js was forked. But aside from the politics of that situation:
The wisest thing at this point in time is probably an internal "policy", along the lines of:
The jury is still out on class, import and export (I like the require syntax and if we're using browserify i'm not sure we need import/export), Symbol (support is meh). All thoughts on this are welcome! For tidiness though I'll open another issue #7 to make sure we can keep the conversation strictly on an architectural /design level in this thread. @seriousme thanks for the clarification on the query.
I agree ES6 support is spotty. Therefore I would try to use Babel to tackle that one. Anything that needs polyfills is probably not suitable for now.
Wrt low level code: I would focus on getting a working prototype first and then see if additional optimization is worth it. My guess : no ;-).
A smart algo always outperforms any code optimization :-) E.g. try beating a hashmap lookup by doing ASM.js ;-)
But then again you might surprise me :-)
@seriousme yeah on point about asm.js - I should have clarified I meant for optimizations not for a very nerdy form of masochism. Will comment on Babel in #7
Another bit of inspiration :-) http://www.slideshare.net/wiredtiger/
Which is the engine under MongoDB 3.0
Some notable points: and
And since its open source, its possible to peek under the hood to figure out how they do it: https://github.com/wiredtiger/wiredtiger
First glance:
(and a whole lot of other stuff ;-))
The team that created this also created BerkelyDB.
@seriousme This is quite cool, I never heard of bloom filters so i'm off reading about them! At some point we're going to have to decide how simple (or complex) we want things to be, but it's good to gather all ideas before typing a single character of code!
@techfort not going to participate in discussing details but dropping another link for your idea gathering process ;)
http://blog.confluent.io/2015/03/04/turning-the-database-inside-out-with-apache-samza/
As the "LogAppend style" was mentioned here already, I guess you guys already know about the link/concepts anyway but with a little chance there's still something of value.
Actually I wonder - and this can be considered a question to you - how tools such as Kafka and Samza fit into the picture. Would it make sense to combine them with Loki or probably just borrow ideas from them?
@ArnoBuschmann thanks for the suggestion, and feel free to participate - this is an open discussion for everybody. I admit to always have dismissed Kafka as a slower alternative to ZeroMQ, but now that I look at some docs I see this very interesting statement: "Apache Kafka is publish-subscribe messaging rethought as a distributed commit log." This certainly sounds like something that Loki could very much benefit from. In full Loki v2 philosophy it would be exceptional if there was a messaging adapter that would support various systems, but I know too little of Kafka at the moment so I guess the first step is to educate myself on the subject.
The style of drawing in the article reminded me of another post on the same blog: http://blog.confluent.io/2015/01/29/making-sense-of-stream-processing/
A database is just a aggregated view of your event stream, which fits nicely with the idea of the collection as an aggregate of change events.
@seriousme yep, both presentations were made by Martin Kleppmann indeed.
In this video Kleppmann describes, how Samza takes the Kafka output and acts as stream processor to create such collections of change events and also how they get merged/enriched:
https://www.youtube.com/watch?v=yO3SBU6vVKA
Am I right to guess that what Samza is doing, could be accomplished by Loki as well?
@ArnoBuschmann comparing Loki to Samza might be a bit ambitious, but the idea is indeed that ,as with any database, change events lead to a persistent aggregate. Now, if you copy that aggregate by reprocessing events or by another, more batch oriented, db replication mechanism is conceptually the same. The difference is only in timing.
@ArnoBuschmann @seriousme i watched Kleppmann's presentation and I am very impressed with the concepts, so thanks for sharing In my opinion the current state of Loki defines it as a client side database (despite my ambitions). My ultimate goal would be for Loki to be used in client applications that keep a completely distributed database going. In fairness, on the server-side, there is no point in trying to get a JS database to compete with MongoDB for performance, stability and maturity.
Theoretically this distributed architecture would allow the existence of a database without a single storage point (an extremely fragile scenario btw), it would also make it impossible for someone to "join" in the network because they have no log to pick up and sync their local data.
So: we need a Log. But I wonder, should this not be a standalone server-side application, that may well be using Kafka/Samza, effectively a product that only needs to exist in a distributed network of Loki clients, and doesn't necessarily need to be written, it could just be a plain Hadoop / Samza type-of application.
Having said all that, a couple of interesting ideas are emerging: Loki clients should be topic producers and consumers, with each collection changes being a stream produced and sent to a stream-processing application, and each client receiving a processed stream (why processed? because if a single object in a collection changed 3 times since the last time you checked, you only need the latest version of that object, which could be an aggregate of several events on the stream).
So we could make Loki strean-format agnostic, and implement different StreamProcessingAdapters (just to cater for different replication mechanisms - with the default being a plain stream of ordered changes to be applied to the local version of the db to sync with the rest of the network).
If anybody wants to jump aboard / create this Loki Stream Processing server component i'm all ears :) But it should probably be separated from LokiJS, what do you think @seriousme @ArnoBuschmann @obeliskos
Having a memory only DB might sound scary, but as long as you have enough copies of the data running its just as safe as having a DB persisted to disk, and as a bonus you get "always on" as well :-) A number of in-memory DB's have this model where disk based persistence is optional or even absent (e.g. https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud ).
I agree with your statement about competing server side with MongoDB for performance, stability and maturity.
The challenge with event based replication is with consumers that start listening mid flight. E.g. you already have an event stream running for months and suddenly you decide to add an extra consumer. Now there are 2 options: a) the new consumer is initialized by replaying all events since the epoch b) the consumer is initialized with a recent copy of the aggregate (= consolidated events) and then fed with a stream of events that occurred after the creation of the aggregate. (the state of the aggregate could be transmitted as a stream of update events as well :-)).
If there is any significant percentage of update-on-update events then option B will reduce the amount of events to be processed. (btw: this is how most Databases I know replicate if they have continuous replication ;-))
The tricky part of the "processed stream" mentioned by @techfort above is that in a stream based world you don't come back to "check". You just swallow events.
So if Loki would be an event database then as soon as you subscribe to an event stream Loki should start to stream you the aggregate (like the DV queries the current state) and after that Loki should just continue to pass you the (filtered) events. (unlike the DV that updates its own aggregate). The consumer (which might be another Loki event database, locally or on the other side of the planet) would then use this event stream to do whatever it needs to do with it. (update the DOM, update its internal aggregate, feed other consumers behind it, switch lights, make coffee, whatever ;-))
@seriousme how about this (not saying I'm innovating anything here, it probably has all already been done and done better):
{ id, timestamp, data}
with data
being some representation of the change, like the current change API, each event stored in the stream (on server)Is this crazy?
Another crazy idea that I'm thinking of is this: Loki could work as an in-memory interface for an underlying MongoDB database. That way, you could have DynamicViews, Changes API and all other Loki goodies (including in-memory speed) with the resilience and performance to disk of MongoDB. I suppose in-memory data would be only MRU data or similar (can't realistically load a MongoDB in memory unless it's tiny, in which case you don't need MongoDB :D ) Is this actually madness?
@techfort : first 4 points sounds logical to me :-) For the second part of your post: I can imagine Loki to be a javascript based "filtered replica" of a mongoDB. e.g.
1) the browser app does a query on Loki. 2) Loki passes on the query to MongoDB, 3) Loki stores the result in a local Loki collection 4) Loki watches the MongoDB oplog for updates. 5) As soon as updates appear Loki filters them (like currently done with DV) and updates its local datastore.
From there on, Loki could:
or
The mechanism could be setup in such a way that one could make this work for Mongo, Couch, Redis etc by supplying a relevant adapter.
This way Loki gets the role where it can shine: its lightweight, easy to use and plays nicely with the big kids :-)
Ok 1, i'm loving this last part :) 3 questions.
The discussion is really interesting but I'm not losing sight of what Loki is and does, so I want to make sure to have a minimal core that fully complies with the current spirit of LokiJS, and ship everything else as optional modules.
My PoV: 1) yes, however you could argue that for a 2.0 the DV's could be called through an adapter (making Loki even more lightweight/faster for those who do not use DV's ;-)), but that's a choice. 2) yes, as long as loki offers a way to pick-up change events (e.g. the changes api) and a way to process updates (the current create/update/delete methods) it would be perfectly possible to make this work 3) that is up to the writer of the adapter :-) As long as Loki stores an Id and a Rev its always possible to reconcile with any backend database. (btw: Id and Rev generation might need to be pluggable as well for that to work as different DB's seem to have different algorithms for that).
Btw: one could also argue that the whole backend comms should be part of app using Loki and Loki itself only facilitates the CRUD and Query stuff. It all depends on the ambitions on Loki ;-)
Regarding your last point: I believe providing Loki adapters to other architectural components is what will make people fire up a VM and give it a go. They'll see it works and go "holy _!" with _ being a variety of english 4-letter swear words :) Friendly API, lightweight, fast, are the mantras.
@ArnoBuschmann thanks again for that link about apache samza . It is uncanny how LokiJS adopted so many of the patterns explained, and without prior knowledge of this. I believe Loki may well develop into a node.js-ecosystem equivalent or at the very least similar product, acting as a fast access layer of materialized views on top of a replicated node of a traditional db.
Hey guys, you were busy and I just read my way through. I like the design and I see so much potential for Loki to develop with the ideas discussed.
Developing from what is already there in well defined steps towards an enhanced system and keeping dependencies (databases, stream processors etc.) decoupled really is the way. Improve functionality by adding whatever system (Mongo, Kafka/Samza...) with adapters but be able to use Loki also without.
Concerning adapters I think two things would be beneficial:
My point is to implement an "easy to get started strategy" and help new users with a proper documentation, tutorials and blogposts (can be made by others but having a place to link them). It's common, that developers want to write code and "don't have time" for the documentation, escpecially as a good documentation requires a lot of additional thought and work. It's important to have people on board who kind of like to care on explaining things. @jrhicks already did a great job with his blogposts :)
@techfort Yes, it's uncanny but it's "in the air" I guess :) points at the multiple discovery hypothesis -> http://en.wikipedia.org/wiki/Multiple_discovery
this discussion is great. :smile_cat:
I believe Loki may well develop into a node.js-ecosystem equivalent or at the very least similar product, acting as a fast access layer of materialized views on top of a replicated node of a traditional db.
:+1:, even better if it integrates well with the level ecosystem.
And another source of inspiration: https://github.com/bevry/query-engine It does not seem tot have indexes, is written in coffeescript, but the demo's and the docs look quite OK.
@techfort Multiple discovery hypothesis, part two -> You opened this issue https://github.com/techfort/LokiJS/issues/109#issuecomment-88466572 with the words
Create a flux store that utilises LokiJS.
After we discussed design ideas for Loki 2 in this thread, today I remembered, that Pete Hunt speaks in this video about how he calls it "full stack flux" for React: https://www.youtube.com/watch?v=KtmjkCuV-EU&list=PLb0IAmt7-GS1cbw4qonlQztYV1TAW0sCr&index=8 And guess what? Everything is EXACTLY the same again as we diskussed it here.
What Pete Hunt actually didn't mentioned ist the availability of a replicated local DB and that is exactly Lokis niche to step in. Previously you told me, that my idea of saying Loki can make isomorphism easy and one could render, no matter if on the server or the client, sounds futuristic, but the more I think about it, I'd say it should work elegantly like this:
Sounds like a huge step to me.
@ArnoBuschmann yep this sounds great, thanks for the link which i'm going to check out immediately.
I am thinking (but it's only a thought) to either force the user to declare the environment (at the moment there is a config option env
which takes NODEJS | CORDOVA | BROWSER
) or to do different builds for node.js-based environments (eg. node.js, NW.js, cordova) and browser. And yes, from there on, it would either fetch or render.
Automatic environment detection gave us a few headaches and i'd rather have a more robust approach in v2, even if it means 1 more line of code for the developers.
@ahdinosaur could you elaborate a bit more on the leveldb ecosystem, how do you see Loki and level integrating?
This thread (and the proceeding one) is genius~
My use case is simply a central db (lokijs - nodejs (there ya go @techfort) that I want to realtime sync to the client (reactjs). I was just pondering the sync issue (and yeah I read Kleppmann's stuff recently too) which made a lot of sense. Currently I have it crudely working by serializing the data into the page and then doing updates on channel events via websocket. Even this is not fullproof as there is a small window before connection where you could miss messages. A log approach sounds tempting~
Couple of random links
https://www.firebase.com - I presume everybody is aware of these guys. I haven't seen an open source clone of this yet, I guess lokiJS might be the first :)
https://news.ycombinator.com/item?id=9328006 - mesh.js (previously crudlet.js)
Has an adapter for lokiJS. Seems more like it's suited for remote api / local data cache store scenario rather than for realtime sync but this comment is interesting..
"It would be awesome if it persisted all the operations to a log. That way, when I attach a new endpoint I could get it "caught up"."
hehe
@hampsterx Nice, mesh.js looks interesting!
@techfort As the project name changed from crudlet to mesh, you might want to change the naming for the Loki adapter? I created a pull request for the readme, but be careful ;) and check it as this is the first Github pull request I ever did, tehe.
https://github.com/ArnoBuschmann/mesh-loki/compare/master...ArnoBuschmann-patch-1?quick_pull=1
hwy @ArnoBuschmann thanks for that PR - everything looks good - however i'm not the owner of mesh-loki, mojo-js is :) that's to say - it's up to mojo-js to merge it. @hampsterx thanks for mentioning mesh and mesh-loki. I found mesh very early on (as in - i was stargazer number 25 or thereabouts) and asked Craig (@crcn ) to create a mesh-loki adapter, the guy is so fast i'm not sure i was finished asking for it and it was done!
And as for you use-case: that's precisely what i'm trying to address, so any ideas on the subject are going to be more than welcome! So far, here's what is forming in my mind:
I don't like lists with more than 7 elements so any element from this on will be number 7.
Nice article on the subject can be found here: http://www.benstopford.com/2015/04/07/upside-down-databases-bridging-the-operational-and-analytic-worlds-with-streams/
TL;DR: Looking at the pros and cons of externalising caches, indexes, materialised views and asynchronous streams of state.
Most optimal solution seems to be: A synchronous writeable view at the front. A range of different read-only views at the back, running asynchronous to one another. An event stream tying it all together with a single journal of state. Side effect free functions that (re)generate different views from the stream. A spout for programs to listen and interact. All wrapped up in a single data platform. A single joined up unit.
I am very intrigued by this approach., and it seems to me that a single product covering the entire stack is entirely feasible. Whether that's Loki or something much bigger and ambitious it is open to discussion, but - as far as i'm concerned - i'm even up for that challenge. At least i'd get to use a strongly-typed language for that (with my preference being c++ :D)
I think your last summary explains it all, it is pretty much what we agreed upon so far, so a designed has definitely emerged in my view. Again, picking up from where you left, I'd re-iterate the importance of functional programming, not only from a point of view of software written giving importance to higher order functions, but also stressing the side-effect-free philosophy.
Based on these, a Loki(JS) can be designed and developed now, with the details of the implementation of each module to be discussed in separate threads. Not that there is any rush, obviously. @seriousme @obeliskos what do you think?
Dropping in late here, but if es6 features are deemed desirable, but ill supported, there's always coffeescript.
First try on a design:
The idea is that:
The Collections
The query engine
Emitted updates can be used by an adapter to:
Updates can be initiated by:
To avoid long loads during LogAppend restores an adapter could LogAppend to disk, start a new log , do a save of the collection and remove the old log (on success). This will ensure maximum robustness of disk based persistence.
Fire away ;-)