tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.45k stars 304 forks source link

Storing raw JS objects and querying according to pointer equality `===` #248

Open CMCDragonkai opened 6 years ago

CMCDragonkai commented 6 years ago

I want to use datascript to store "pointers" to raw JS objects, and be able to query for them.

So I did a small test to see if it could work.

First with a class which works:

const d = require('datascript');
class DummyObj {}
const obj1 = new DummyObj;
const obj2 = new DummyObj;
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj1); // only entity 1
d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj2); // only entity 2

Second with just a normal object:

const d = require('datascript');
const obj1 = { x: 1 };
const obj2 = { x: 1};
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj1); // []
d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj" ?obj]]', db1, obj2); // []

But the second one doesn't, it just returns nothing.

Also are certain objects ever directly serialised? What determines whether to compare objects by their serialised form vs by their object pointer?

tonsky commented 6 years ago

Objects are not serialized, but they are compared using CLJS compare https://github.com/tonsky/datascript/blob/eaa83844efda2e8d83e80521dd06a15a154c710b/src/datascript/db.cljc#L290. That means if you ever look up by value, you can only look up by primitives: strings, numbers, cljs keywords. If it works for some objects it’s most probably just by accident

CMCDragonkai commented 6 years ago

What does CLJS compare do with objects that are instantiated from classes? The above code shows that it works for new DummyObj.

This would be a very useful function for me, because I need a table that I can query that stores pointers. And some rows may store the same pointer. And then I would update all rows with the same pointer to point to something new. It would support an immutable keyless B+tree.

tonsky commented 6 years ago

Here’s the code:

https://github.com/clojure/clojurescript/blob/9ddd356d344aa1ebf9bd9443dd36a1911c92d32f/src/main/cljs/cljs/core.cljs#L2345-L2369

I guess maybe first case falls under (identical? (type x) (type y))? Not sure

rauhs commented 6 years ago

Extending the type to be IComparable is probably tough if you use a mangled JS build. Though it'd be easy if you built datascript yourself (you can see shadow-cljs to get you a webpack compatible build of datascript). Right now the simplest (hacky) workaround would be to always add an array and give the first element in the array an "ID" that's unique and primitive (string,number,bool) and store the actual payload (your object) as the second value of the array:

[12 {x: 12, other: "bar"}]
[15 {x: 15, other: "foo"}]

Those value will be comparable to CLJS and will also properly be sorted (important for initializing the DB which uses arr.sort()).

FWIW, I think CLJS could be more lax about this and allow all objects to cljs.core/compare as long as both have a valueOf function, which is required to return a primitive value by the JS standard. Though I'm not sure such a change would be accepted (feel free to open a ticket about it).

CMCDragonkai commented 6 years ago

I really need it to compare based on pointer equality of the object itself. Are you saying to tag each object created with a special unique id before inserting into datascript?

Also I'm sure there are certain value types that cannot be ordered, I wouldn't think of pointers to objects as being ordered. So I'm not sure what kind of benefit sorting is for this situation.

CMCDragonkai commented 6 years ago

I found that identical? in CLJS maps directly to ===: https://stackoverflow.com/a/13005218/582917

rauhs commented 6 years ago

If you store values in Datascript and query by them (as above) you absolutely need to make your values comparable. If you really just care for identity and don't have a natural ordering for your values then I'd do the following:

  1. Generate a unique ID for each object, (1, 2, 3....), add this ID to your JS object which also has pointers.
  2. Attach this unique ID to an indexed datascript attribute. Something like object/pointers-id
  3. Attach the actual payload (your JS object with pointers) to some other datascript attribute. Something like object/pointers.

Then only ever query by object/pointers-id and get the value from the entity on the pull. That'd be less hacky and scale well.

CMCDragonkai commented 6 years ago

Thanks for the advice, however I'm not familiar with clojure. What would those 3 steps look like in JS?

CMCDragonkai commented 6 years ago

Still I'm confused why would there be a different behaviour from using just {} vs new DummyObj. The code samples pointed out by @tonsky doesn't appear to deal with the difference. In JS, both are typeof Object, and both are instanceof Object. The only difference is that obj1.constructor === DummyObj and ({}).constructor === Object.

CMCDragonkai commented 6 years ago

Another test:

const d = require('datascript');
const obj1 = { x: 1 };
const obj2 = { x: 1};
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.pull(db1, '[*]', 1).obj === obj1; // false (there was a parentheses typo here)

It shows that these are no longer the same object. That must mean datascript must be doing a shallow or deep copy of the normal object that is being inserted. (Later I found out that it was in fact a deep copy.)

I think the docs should make clear that when inserting JS objects, if they are literal objects, they get copied, while if they are class instantiated objects, they are inserted by reference. This occurs even when the class instantiated objects are deeply nested.

tonsky commented 6 years ago

That must mean datascript must be doing a shallow or deep copy of the normal object that is being inserted.

DataScript certainly does not do that. Check your tests

CMCDragonkai commented 6 years ago

@tonsky Have you tried running this?

const d = require('datascript');
const obj1 = { x: 1 };
const obj2 = { x: 1};
const db = d.empty_db();
const db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [':db/add', 2, 'obj', obj2]]);

d.pull(db1, '[*]', 1).obj === obj1; // false

It shows that with obj1 which is just plainly {x: 1}, which is added into the DB. Then when I pull it out, I compare it with obj1 using ===. It returns false. I'm running on Node v8.7.0. I copy it verbatim and run it. That's what happens. If it's not copying it, then what is d.pull(db1, '[*]', 1).obj?

I've tested again with new Object({a:1}), it is the same result as a normal literal object. But as soon as it is a class instantiation, then it does return true when doing ===. It even happens for deep objects.

rauhs commented 6 years ago

I don't know the JS side API of datascript so I can't help you there. Forget about the difference between Obj vs Class instance. Both won't work, you just can't query by values which are not comparable. Try implementing my idea above. Pseudo code:

db = d.empty_db({"obj-id", {":db/index" true}});
db1 = d.db_with(db, [[':db/add', 1, 'obj', obj1], [":db/add", 1 "obj-id" obj1.id]
                     [':db/add', 2, 'obj', obj2], [":db/add", 2, "obj-id", obj2.id]

;; Now query by obj id:
d.q('[:find (pull ?e [*]) :in $ ?obj :where [?e "obj-id" ?obj]]', db1, obj1.id);

Use a factory method to get you a new object with a newly generated id.

tonsky commented 6 years ago

I’m sorry, you’re right. DS does tries to convert entities to CLJS values and back

https://github.com/tonsky/datascript/blob/eaa83844efda2e8d83e80521dd06a15a154c710b/src/datascript/js.cljs#L36

CMCDragonkai commented 6 years ago

@rauhs Just a clarification, does this mean datascript cannot index things that are not ordered (like using hash indexing)? I just tried it:

Error: Cannot compare [object Object] to [object Object]
tonsky commented 6 years ago

usually it can store incomparable values. You can’t store them cardinality-many attributes, you can’t make them indexed or unique. Otherwise it should be fine.

CMCDragonkai commented 6 years ago

I'm making an adapter to make sure all my object keys are given unique numbers so they can be indexed by datascript.

But I had a thought experiment as to whether datascript in the future could index JS objects. Well I found that other than ES6 Map and WeakMap, there's no other easy way to index object keys in JS. But I looked at Facebook's immutable.js codebase, and here's their implementation for "hashing" JS objects that can be used as keys in their Immutable Map and Ordered Map. https://github.com/facebook/immutable-js/blob/7f4e61601d92fc874c99ccf7734d6f33239cec8c/src/Hash.js#L85-L153

Maybe a feature request for the future?

There's also a discussion about this feature: https://github.com/facebook/immutable-js/issues/84 Previously immutable.js also couldn't store objects as keys, but after that commit, objects could be stored as keys for immutable sets, maps and orderedmap.

tonsky commented 6 years ago

cool, thanks

CMCDragonkai commented 6 years ago

Here we go: https://github.com/MatrixAI/js-object-tagger

CMCDragonkai commented 6 years ago

BTW @rauhs even if I use object tagging to allow object keys to be indexed by proxy of the numeric tag. I still need to make sure my objects are class instantiated (not new Object() as it doesn't work), because as demonstrated before, datascript copies literal objects on insertion. I just tried with the pull API, and it did this again. However the entity API is strange as instead of giving back my object, it gives back some different kind of object (seems like another entity itself).

I think the docs should make clear that when inserting JS objects, if they are literal objects, they get copied, while if they are class instantiated objects, they are inserted by reference. This occurs even when the class instantiated objects are deeply nested. https://github.com/tonsky/datascript/issues/248#issuecomment-360121042

I hope one day this feature will be made explicit, the ability to make sure even literal objects are stored by reference and not copied.


Found another hack to get referenced objects: Object.create(null) creates an object with undefined constructor.