tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.46k stars 304 forks source link

can transaction reports be used to incrementally serialize the db #415

Closed uriva closed 2 years ago

uriva commented 2 years ago

can listen be used to incrementally serialize the db persistently?

I thought it can, but it appears to not handle retractions correctly, when using conn_from_datoms on concatenated datoms from transaction reports, it treats the retractions as if they were assertions.

I noticed that the only difference between retractions and assertions in the transaction report is a negative tx, but according to the docs there should be a dedicated field that stores a boolean which says if something is a retraction or not. Could not find this.

Right now I'm using JSON.stringify to serialize a single datom. I assume serialize will only work on an entire db so not using that.

(JS api)

tonsky commented 2 years ago

I don’t think conn-from-datoms can handle retractions. You can try transact the concatenated log, should work. Or even better, do not concatenate and transact multiple times, once for each transaction from the log

uriva commented 2 years ago

is this what you mean?

const d = require("datascript");
const reports = [];
const conn = d.conn_from_db(d.empty_db());
d.listen(conn, (report) => reports.push(report));
d.transact(conn, [{ ":db/id": -1, name: "alice" }]);
d.transact(conn, [[":db/retract", 1, "name", "alice"]]);
d.transact(conn, [{ ":db/id": -1, name: "alice" }]);
const conn2 = d.conn_from_db(d.empty_db());
reports.forEach((report) => d.transact(conn2, report.tx_data));
console.log(d.serializable(d.db(conn)).eavt, d.serializable(d.db(conn2)).eavt);

doesn't seem to work, the dbs differ:

[ [ 2, 0, 'alice', 3 ] ] []

uriva commented 2 years ago

a more realistic example with JSON.stringify and JSON.parse in the middle:

const d = require("datascript");
const reports = [];
const conn = d.conn_from_db(d.empty_db());
d.listen(conn, ({ tx_data }) =>
  reports.push(JSON.parse(JSON.stringify(tx_data)))
);
d.transact(conn, [{ ":db/id": -1, name: "alice" }]);
d.transact(conn, [[":db/retract", 1, "name", "alice"]]);
d.transact(conn, [{ ":db/id": -1, name: "alice" }]);
console.log(d.serializable(d.db(conn)).eavt)
const conn2 = d.conn_from_db(d.empty_db());
reports.forEach((tx_data) => d.transact(conn2, tx_data));
console.log(d.serializable(d.db(conn)).eavt, d.serializable(d.db(conn2)).eavt);

has even weirder results:

[ [ 2, 0, 'alice', 3 ] ] 

[
  [ 1, 0, 0, 1 ],          [ 1, 1, 'name', 1 ],
  [ 1, 2, 1, 1 ],          [ 1, 3, 0, 1 ],
  [ 1, 4, 2162164496, 1 ], [ 1, 5, 536870913, 1 ],
  [ 1, 6, 'alice', 1 ],    [ 1, 7, 0, 1 ],
  [ 2, 0, 0, 2 ],          [ 2, 1, 'name', 2 ],
  [ 2, 2, 1, 2 ],          [ 2, 3, 0, 2 ],
  [ 2, 4, 2162164496, 2 ], [ 2, 5, -536870914, 2 ],
  [ 2, 6, 'alice', 2 ],    [ 2, 7, 0, 2 ],
  [ 3, 0, 0, 3 ],          [ 3, 1, 'name', 3 ],
  [ 3, 2, 2, 3 ],          [ 3, 3, 0, 3 ],
  [ 3, 4, 2162164496, 3 ], [ 3, 5, 536870915, 3 ],
  [ 3, 6, 'alice', 3 ],    [ 3, 7, 0, 3 ]
]
tonsky commented 2 years ago

I think you need to convert from tx_data (which is datoms, I think) to transaction format (":db/add" / ":db/retract")

uriva commented 2 years ago

ah you're right. this works:

const { ifElse, map, pipe } = require("ramda");
const addDatom = (datom) => [":db/add", ...datom];
const removeDatom = (datom) => [":db/retract", ...datom];
const objToArray = ({ e, a, v }) => [e, a, v];
const datomToTransaction = ifElse(
  ({ tx }) => tx > 0,
  pipe(objToArray, addDatom),
  pipe(objToArray, removeDatom)
);

const d = require("datascript");
const reports = [];
const conn = d.conn_from_db(d.empty_db());
d.listen(conn, ({ tx_data }) =>
  reports.push(JSON.parse(JSON.stringify(tx_data)))
);
d.transact(conn, [{ ":db/id": -1, name: "alice" }]);
d.transact(conn, [[":db/retract", 1, "name", "alice"]]);
d.transact(conn, [{ ":db/id": -1, name: "alice" }]);
console.log(d.serializable(d.db(conn)).eavt);
const conn2 = d.conn_from_db(d.empty_db());
const transactions = reports.map(map(datomToTransaction));
console.log(transactions);
transactions.forEach((tr) => d.transact(conn2, tr));
console.log(d.serializable(d.db(conn)).eavt, d.serializable(d.db(conn2)).eavt);

althought it begs the question why use conn and listen, when it's easier to get the data before it goes into it...

anyways thank you!

uriva commented 2 years ago

Maybe one more question - although this works I lose the transaction time. Is there a way to inject the right time back in? I think it's the s property of the datom in the report, but not sure how to make it transact with time.

tonsky commented 2 years ago

Transaction ids should match if you start tracking from the very beginning. Otherwise, there’s no way to override them.

P.S. Although it’s not incremental, there’s a good way to make full snapshots: serializable/from_serializable. Wrap it with JSON.stringify/JSON.parse to get to/from string. You can even match it with incremental snapshot: a few transactions + full snapshot once transaction list is too big

uriva commented 2 years ago

for now i made time an attribute for when it's important.