tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.45k stars 302 forks source link

Asynchronous Sources in DataScript #190

Open wbrown opened 7 years ago

wbrown commented 7 years ago

Currently, DataScript queries are synchronous. When a query is executed, each pattern has search called in sequential order against a record that implements datascript.db.ISearch. The search calls are synchronous, as they are currently operations against an in-memory immutable BTSet record. This makes sense for a single-threaded environment such as JavaScript / ClojureScript, and functions in Clojure as well.

This proposes to add the capability for asynchronous data sources to DataScript. This can be done as a branch or fork of DataScript in the interim, but I would like to see this merged into DataScript if possible.

This opens the following possibilities:

In the case of LevelDB and IndexedDB, there are complications when it comes to guaranteeing immutability, but it is a separate subject from asynchronous queries and out of scope of this proposal.

At a high level, we would implement an IAsyncSearch protocol, where the return values are expected to be promises or channnels. The intermediate query functions would check for the implementation of the IAsyncSearch methods on the source object provided. On a call to datascript.core/q the sources would be checked. If any sources were provided that implement IAsyncSearch, we would synchronously block on the call if the platform supports it, otherwise, we would return a promise or channel.

It would be useful to note the requirements, implied or explicit:

There are a few ways that asyncronity can be gotten:

Native

Clojure already has support for promises via promise and synchronously blocking on dereferencing. ECMAScript 6 also supports promises, but would require the JavaScript environment to support ECMAScript 6.

Implementing native support at this level would require a lot of platform specific code, but would work in a JavaScript-targeted environment.

core.async

core.async is well-supported on both Clojure and ClojureScript, but:

core.async does have the advantage in that it is supported on earlier JVM and JavaScript platforms.

cljs-promises

cljs-promises is built on core.async and provides a promise facility that addresses one of the concerns above. It still does not solve the issue where it has to be asynchronous all the way up. It is also ClojureScript-specific, and does not satisfy the cross-platform requirement.

redlobster

redlobster is a ClojureScript promise facility with strong ties to NodeJS, but it has been shown to work in browsers as well if one ignores some of the Node-specific functionality. However, the query call itself would need to be asynchronous, returning a promise. It is also ClojureScript-specific.

promesa

promesa provides a cross-platform abstraction layer for both Clojure and ClojureScript.

A possibly big negative is the requirement for JDK8, transitively imposing a dependency onto DataScript. It however fulfills much of the other requirements, and would be my choice.

refset commented 7 years ago

For everyone's benefit, the previous issue on this topic is here: https://github.com/tonsky/datascript/issues/22

whilo commented 7 years ago

Just to note, I am interested in this as well and have had a look into it several times in the last years. While I use DataScript and would love to see a durable index, my primary focus is on building a distributed data management system with replikativ which can also be used for Dat* replication. As I have focused on similar problems to have cljs compatible code, I have decided for core.async and defined IO protocols, e.g. for storage with konserve on top of supervised async.

There was quite a bit of boring yak-shaving involved, but I would suggest to break the problems down and build a set of robust cross-platform abstractions to build things like a durable DataScript. In general cross-platform cljs does not have a lot of composable building blocks yet and the asynchronous nature of JavaScript sadly requires non-blocking interfaces on the JVM as well, which barely any Clojure library considers. While promises and other async solutions have in part the benefit of not transforming all your code with core.async, the facilities provided by having very concise async code should be seriously considered. core.async is also fairly solid now. For concrete implementations of limited asynchronous functionality, callbacks are generally prefered for libraries, because they do not push core.async on the user. But once we talk about exposed protocols and interfaces, I would argue that concise blocking semantics allow better composition with less glue code, this is why core.async is considered to be the async library for the Clojure ecosystem. For example error-handling is not easily handled well in an async setting in cljs; or to model rendevouz points (where the sender only is unblocked once the reader consumes the value) between different async libraries is also non-trivial.

In the direction of building blocks I am at the moment primarily interested in having a persistent durable index data-structure (only a first experiment) which is implemented against such portable protocols (e.g. konserve). This would help me to have nearly optimal delta compression for CRDT metadata in replikativ and in general would allow to build fairly sophisticated snapshottable IO infrastructure including an async query engine on Datom indices for DataScript. In particular I am having a look at the hitchiker tree atm. to port it on konserve (instead of redis). Do you think some common infrastructure undertakings like this are reasonable or do you see obstacles?

theronic commented 6 years ago

@whilo just shouting some encouragement - would love to see a strong story for distributed state handling based in DataScript :)

whilo commented 6 years ago

@theronic how do you envision write coordination? like in datomic with a single transactor?