wandersoncferreira / mamulengo

Lightweight database based on datascript with durable store and time travel for Clojure(Script)
Eclipse Public License 2.0
51 stars 3 forks source link

Handling multiple mamulengo DBs? #49

Open theronic opened 4 years ago

theronic commented 4 years ago

My question is intended to be constructive criticism of the API which does not take conn or db, e.g. (m/as-of! #inst "2020..."). Without passing the connection or database as a value, I don't know what state I'm dealing with and makes it hard to write tests.

wandersoncferreira commented 4 years ago

Hello @theronic thanks for taking time to comment here. This is something I was thinking about from a "sqlite user" perspective.

I'm used to the concept that (m/as-of! db # inst ...) means that I want to restrict this specific db under this constraint rather than (m/as-of! # inst...) which means give me the database restricted to this contraint.

if you see the history of the README, I was thinking about API for a while and made this omission intentionally assuming most users would be meaning the 2) option above. But from the query perspective it makes a lot more sense to take the database as input and query that database if requested. (so, you can do that (m/query! '[:find ?e ..] db)).

I decided to leave this way and see the criticism of other people too. This makes sense? I better provide both situations as I did with the m/query! fn!!

theronic commented 4 years ago

It's pretty easy to define a set of partial function helpers that take a known conn or db, e.g.:

(ns helpers (:require '[mamulengo.core :as m])
(def conn (m/create-conn))
(def q (fn [qry & inputs] (apply m/query qry @conn inputs))
(def transact! (partial m/transact! conn))
(def entity #(m/entity @conn %))
;; etc.

I usually do this in a myproject.data.core namespace for convenient profiling.

Or you could have a set of functions that deref a default db that is set via (m/set-default-db! db).

In the early stages of the API, it's easy to capture conn or db going forward, but it's going to hurt if all code has to be adjusted to pass in conn or db.

On the API naming front, I notice that many of the API functions have trailing exclamation marks, e.g. query!. These imply that they have side-effects, but reading from disk (like as-of!) is not a side-effect. It's only side-effectful because it is mutating an invisible db value, which is like flying blind and hard to test against.

wandersoncferreira commented 4 years ago

Hi, thanks for the comments. I am going to release a version with the explicit database as argument to the functions. I did a small experiment towards that goal and I think this will not be too complicate to implement, worth having it!

wandersoncferreira commented 4 years ago

@theronic I got into a tricky situation while implementing this feature and I tried to use datahike to understand how they manage to fix this issue. Could you provide some words about it too?

Look, what is the correct state of conn3 and what happened with the data from transaction with conn2 in this situation?

(def uri "datahike:file:///tmp/file_example")
(def schema-tx [{:db/ident :name
                 :db/valueType :db.type/string
                 :db/unique :db.unique/identity
                 :db/index true
                 :db/cardinality :db.cardinality/one}
                {:db/ident :age
                 :db/valueType :db.type/long
                 :db/cardinality :db.cardinality/one}])
(d/delete-database uri)
(d/create-database uri :initial-tx schema-tx)
(def query '[:find ?n ?a :where [?e :name ?n] [?e :age ?a]])
;;; connection one
(def conn (d/connect uri))
(d/transact conn [{:name "Alice" :age 25} {:name "Bob" :age 30}])
(d/q query @conn)
;; => #{["Alice" 25] ["Bob" 30]}
;;; new connection with the new database state
(def conn2 (d/connect uri))
(d/transact conn2 [{:name "Alice Mother" :age 60} {:name "Bob Father" :age 52}])
(d/q query @conn2)
;; => #{["Alice" 25] ["Alice Mother" 60] ["Bob Father" 52] ["Bob" 30]}
;;; transact again with connection one
(d/transact conn [{:name "Alice Grandmother" :age 90}])
(d/q query @conn)
;; => #{["Alice" 25] ["Alice Grandmother" 90] ["Bob" 30]}
;;; new connection 3... what should be the database state?
(def conn3 (d/connect uri))
(d/q query @conn3)
;; => #{["Alice" 25] ["Alice Grandmother" 90] ["Bob" 30]}
(d/q query (d/history @conn3))          ;even in the history db
;; => #{["Alice" 25] ["Alice Grandmother" 90] ["Bob" 30]}
theronic commented 4 years ago

Is d/transact asynchronous? You may need to deref the call to d/transact - at least that is how Datomic works. Try @(d/transact conn ...)

Also, the datahike:file: schema might only allow one writer at a time.

wandersoncferreira commented 4 years ago

They are synchronous. I was thinking exactly the same about datahike:file because when I try to perform the same operation with datascript it raises an exception. The transaction id created by each one of the datascript instances are independent of each other and they repeat, therefore if I try to persist both of them I will have to perform an update inplace or figure out some other way to control how datascript creates these transaction ids. (which I don't think is possible)

So the solution would be to allow a single writer and perform updates in place to mimic the behavior you saw above or to throw an exception and do not perform the updates.

wandersoncferreira commented 4 years ago

The problem with the datahike version for me is that as no exception is raised, I can't be sure the correct behavior of the transactions. I was expecting it to save both of the data, but I can get a surprise later on and discover that only the last writer connection took over the control panel and "updated" his state to the whole database.

theronic commented 4 years ago

I would suggest opening an issue on the Datahike project for this: https://github.com/replikativ/datahike/issues. Maybe @whilo or @kordano can advise.

Might need to call a "flush" somewhere, but I couldn't find any docs in the Datahike API.

whilo commented 4 years ago

Our transact is synchronous at the moment, this is an unfortunate inconsistency with Datomic that we will fix. All write operations are serialized inside one VM context, so there will be no data corruption. If you run multiple JVMs on the same filestore then you can get data corruptions as this is not supported.

The problem here is that there are multiple connections created and we synchronize around each connection object. This is indeed a bug that we have overlooked, since we only created one connection per runtime so far. Thanks for pointing it out! The problem is here: https://github.com/replikativ/datahike/blob/master/src/datahike/store.cljc#L72. A better solution will be to use our new transactor though, as this will explicitly serialize the operations in one place of the full distributed system. This is currently WIP.

whilo commented 4 years ago

Also as a sidenote: Datahike very much focuses on the SQLite basis use case and staying as lightweight as DataScript, so it seems we have fairly common goals with mamulengo. The hitchhiker-tree composes in terms of writing and iterating in two places of the codebase directly with the DataScript codebase as a sorted map, so if the DataScript codebase makes sense to you so should Datahike (we have done some refactorings though). One limitation that we are currently trying to figure out how to address is regaining ClojureScript with durability in the browser though.

Edit: One of our developers is in Brazil actually :) Pablo Mestre Drake, he is in Foz do Iguaçu.

wandersoncferreira commented 4 years ago

Thanks @whilo for the detailed answer. I didn't know datahike before and I was very lazy in my search for such tool and found a partially done implementation towards what mamulengo is right now. I started to study datahike's solution and it's very complete. The current implementation of mamulengo uses datascript quite a lot and I think I cannot manage to overcome this problem of serializing the operations without some very weird workaround of datascript databases.

This is an interesting problem. I don't know if I implement the functionality and raises an exception when someone tried to reproduce this scenario. While I think more about the problem itself. rsrsrs