tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.5k stars 309 forks source link

Idents? #52

Open mdhaney opened 9 years ago

mdhaney commented 9 years ago

I'm so excited that you added lookup refs and upsert support! Trying them out right now.

I noticed in the changelog you also added entid support - so does that mean I can now specify :db/ident for entities?

sparkofreason commented 9 years ago

+1. Lookup refs definitely a big improvement when working with data from Datomic, but :db/ident support for ref attributes would be extra tasty.

tonsky commented 9 years ago

Well, idents are trickier because they not only affect input (you specify keyword, it gets resolved to eid), but also output (e.g. in Datomic results you won’t get entity id or attribute id if that eid has ident, you’ll get keyword in instead).

Datomic sure needs idents to save bandwidth/storage (so they can store integer attribute ids instead of strings/keywords which take a lot of space). They explicitly don’t recommend using idents for anything else but attrs and enum values (see http://docs.datomic.com/identity.html#sec-3).

But for in-memory implementation like DataScript you don’t need to save space (you operate with pointers anyway, so all keywords point to the same location in memory). You can also use keywords directly to name attributes and enum values. I find this model easier to understand and easier too implement too.

Idents are also not free, they have to be resolved twice (on input and on output). That’s extra hashmap or index lookup per, say, returned entity.

So I’m still deciding if idents are needed. If you guys can name couple of use-cases, it would be helpful.

Meanwhile, ident resolution can be emulated with [:db/ident :special-key] kind of lookup refs.

sparkofreason commented 9 years ago

Our main use-case is the ability to share query definitions between Datomic and Datascript. We're pushing the "Datomic-as-protocol" idea as far as we can, and it's working out very well, but API differences like this add friction. For instance, we're able to develop disconnected from Datomic using Datascript along with some CLJX. This makes for rapid client development, but then we want to hook up to a server and use the same queries built in the disconnected case.

tonsky commented 9 years ago

I understand that. My question is how are you using idents in Datomic then.

sparkofreason commented 9 years ago

We have various type "enums" which we mix and match, mainly as filters for list views. The types are also used as metadata to specify which attributes go with which entity types, for the purpose of auto-building editor forms, etc.

mdhaney commented 9 years ago

My use case is that I create an "app" entity for various housekeeping stuff, and it's inconvenient to have to store the entity id to refer to after creating this entity. I've been cheating by using 0 for the eid, but that felt kludgy to me.

Idents would work here, but lookup refs should work just as well, I.e. I can add something like :entity.type :app and use that as a lookup ref to easily refer to it.

So yeah, if there are performance implications, I would rather not have you pursue it.

sparkofreason commented 9 years ago

For my case, I think it would suffice to handle only the input side of things. That avoids the necessity to pull from Datomic all of the enum entities in addition to the data which references them.

sparkofreason commented 9 years ago

I've found away to work around this and keep things consistent on the Datomic/Datascript sides, so I'll retract my request for this feature.

cigitia commented 9 years ago

I'd like to affirm that I find idents useful in both use cases mentioned by others above:

Without idents, as far as I know, in order to modify any of those singleton and singleton entities in transactions, their IDs must be stored beforehand separately from the database, either manually hardcoded into code using fragile magic numbers—or manually retrieved sometime after the database is initialized with their them. Those IDs must then be manually passed into every transaction that involves them. Storing all of these ID constants and using them in transactions aren’t a big deal when there is only one singleton object, but they become an issue when there are many singleton or enumerable objects for which to store IDs, and lookup keys don't really help with this.

The Entity and Pull APIs also would benefit from idents, giving the ability to easily retrieve all attributes of singleton entities at once, without having to manually store their IDs elsewhere—I use this while serializing a singleton object's state into a non-EDN text format.

Idents would be very useful.

Thanks again for DataScript, though; I'm thankful for how amazing it already is.

tonsky commented 9 years ago

Ok, I’ll look into it later

On Sat, Mar 7, 2015 at 11:01 AM cigitia notifications@github.com wrote:

I'd like to affirm that I find idents useful in both use cases mentioned by others above:

  • I use idents to easily use singleton objects, such as an “app” entity that contains things like whether a certain UI mode is on or off. If one needed to store the text currently entered into a box an application’s UI, there is then no other place to store it other than some single entity—and that single entity will need to be modified in transactions in the future.
  • I also use idents to refer easily to enumerated objects, such as states or categories of other entities, in code that needs to refer to specific categories. Those categories might also have attributes of their own, such as end-user display names, but each one changes the application's logic in a different way, so the code need to be able to directly refer to them, especially in transactions.

Without idents, as far as I know, in order to modify any of those singleton and singleton entities in transactions, their IDs must be stored beforehand separately from the database, either manually hardcoded into code using fragile magic numbers https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants—or manually retrieved sometime after the database is initialized with their them. Those IDs must then be manually passed into every transaction that involves them. Storing all of these ID constants and using them in transactions aren’t a big deal when there is only one singleton object, but they become an issue when there are many singleton or enumerable objects for which to store IDs, and lookup keys don't really help with this.

The Entity and Pull APIs also would benefit from idents, giving the ability to easily retrieve all attributes of singleton entities at once, without having to manually store their IDs elsewhere—I use this while serializing a singleton object's state into a non-EDN text format.

Idents would be very useful.

Thanks again for DataScript, though; I'm thankful for how amazing it already is.

— Reply to this email directly or view it on GitHub https://github.com/tonsky/datascript/issues/52#issuecomment-77673902.

sparkofreason commented 9 years ago

I've been able to work around some of the Datomic/Datascript differences with a few small utility functions. See gist: https://gist.github.com/sparkofreason/6b3ffd63d148cd7dc37a

tonsky commented 9 years ago

Yep, looks painful :) I’ll try to add idents to DataScript soon

(extend-type UUID IComparable (-compare [x y](compare %28datascript/squuid-time-millis x%29 %28datascript/squuid-time-millis y%29)))

this should probably go as a patch to CLJS itself. Only compare UUIDs as strings, without datascript semantics

On Sun, Mar 8, 2015 at 12:17 PM Dave Dixon notifications@github.com wrote:

I've been able to work around some of the Datomic/Datascript differences with a few small utility functions. See gist: https://gist.github.com/sparkofreason/6b3ffd63d148cd7dc37a

— Reply to this email directly or view it on GitHub https://github.com/tonsky/datascript/issues/52#issuecomment-77735347.

sparkofreason commented 9 years ago

Not that painful, really. If it were a choice between using the utility functions or having idents but degraded Datascript perf, I'd take the first option. Personally I think the problem is really with Datomic. The enum abstraction is leaky when used through the pull API, would rather see it fixed there.

cigitia commented 9 years ago

Idents are still important for managing many singleton and enumerated entities, particularly in transactions and query/entity/pull inputs. That Gist above would help out with resolving idents in pull-API entities, but it does seem painful, and it only works for the pull API anyway, not transactions, where I find the lack of idents the most painful.

If output performance still is that large of a concern, it might be mitigated in a couple of ways:

I think I prefer the first choice the most. But in the end, as always, it's difficult to predict just how much ident resolution would affect performance in general anyway without actually trying it. It might even end up not being a significant problem at all.

But either way, idents remain important in general, though, and it would be really nice to give at least the option to use them.

wilkerlucio commented 9 years ago

I agree with @cigitia, I often find myself wanting these singletons in my app, I would really like to see those in Datascript.

metasoarous commented 8 years ago

Any updates on this? I've been doing something similar to @sparkofreason with entity type enums with references to associated attributes for semi-automated form and view rendering. I'm actually going to be talking about this at Clojure/West, FWIW.

The biggest pain in this work (for me personally) has been navigating these differences with respect to idents in Datomic vs DataScript in queries and pulls. I could probably live without modifying query output, for the most part, if that made the problem easier.

kristianmandrup commented 8 years ago

I'm only just getting started with Datascript, trying to select all attributes of an entity using this example http://www.learndatalogtoday.org/chapter/4, which relies on :db/ident. How else would I achieve that without keep record of said metadata elsewhere?

To get the actual keywords we need to look them up using the :db/ident attribute:

[:find ?attr
 :where
 [?p :person/name]
 [?p ?a]
 [?a :db/ident ?attr]]

But I guess you could use Lookup refs to achieve the same effect? If I understand correctly from entity-identifiers lookup refs are like custom primary keys, whereas ident is generated?

augustl commented 6 years ago

I ended up storing my singleton "bookkeeping entity" outside of datascript, in a plain old atom. Works well enough!

jleonard-r7 commented 2 years ago

idents as "relation alias" are also quite useful: https://docs.datomic.com/cloud/best.html#use-aliases

For "singletons", i've solved this by just having a "root" obj at the top level and attaching them all to it: via ":root/singleton" for example.

jleonard-r7 commented 2 years ago

Any chance to support idents soon?

tonsky commented 2 years ago

@jleonard-r7 I didn’t plan on it, no, given that there’s a workaround

jleonard-r7 commented 2 years ago

@tonsky can you show the workaround that provides "relation aliases" as I linked to here: https://docs.datomic.com/cloud/best.html#use-aliases

tonsky commented 2 years ago

This is multiple names for single attribute? That’s not possible in DataScript. I’m not even sure it’s a good idea

jleonard-r7 commented 2 years ago

It’s a good idea when you have externally provided data with less than ideal naming. It’s simply a tool that one can use to solve a problem.

Assembly language exists as a “workaround” for C. Or Java for Clojure. Who needs Clojure when Java exists? The question is one of expressiveness. But I see that you don’t want to do any more than the minimum here.

metasoarous commented 2 years ago

@jleonard-r7

But I see that you don’t want to do any more than the minimum here.

To be clear, @tonsky does not owe anyone anything. He is the author of, and thus has authority over the direction of DataScript. Full stop. If you'd like to see your ideas considered, you'll have a much better shot if you're making reasoned arguments for them, rather than accusations of laziness. This is (I can only imagine) something @tonsky has considered carefully, and has his reasons for.

Now, regarding the actual point of discussion here: As an in-memory database, datascript data is inherently ephemeral. If you are writing it out somewhere, you must also be "rehydrating" it, and at that point in your code, could implement a rename. This is a not unreasonable solution for the problem you describe.

For databases built on durable indexes, this is a lot more complicated, which is why I think Datomic's approach makes a lot of sense. If you're looking for something open source that may (no guarantee) have behavior closer to Datomic in this regard (as it does in several other ways), you can look at DataHike.

Please keep in mind, @tonsky has given the world something for free, and deserves our thanks. His gift does not come with entitlement to his labor.

It's open source software; If you don't like something, fork it (as DataHike did). It's your legal right, which @tonsky was kind enough to grant you in releasing DataScript as OSS.

Thanks

jleonard-r7 commented 2 years ago

Both of you are exhibiting extreme arrogance and quite frankly making assumptions about the situation which are incorrect. This attitude is commonplace in languages where the language designer operates like a dictator but it’s antithetical to the spirit of Lisp.

Sure he doesn’t owe me anything. And that anything includes his arrogant, entitled, self-appointed arbitration of what is considered “good taste” for my work products.

Once again, “you can do that elsewhere” boils down to “assembly language is a workaround to the lack of C” which is an absurd position to take.

And btw I’m clearly not the only one asking for this very reasonable feature given the length of this thread and the plethora of other requesters. It’s a clear pattern of dismissal on contrived grounds (excuses) to an objective reader, in my opinion.

And yes it sounds like DataHike may be run by more reasonable, open-minded and cooperative folks. I will definitely consider it an alternative to this regressive, authoritarian regime.

metasoarous commented 2 years ago

@jleonard-r7 You do realize that @tonsky expressed support for this issue a few years back, right?

https://github.com/tonsky/datascript/issues/52#issuecomment-77746736

More recently, he said he "doesn't plan on" working on it.

https://github.com/tonsky/datascript/issues/52#issuecomment-1119971053

But he also hasn't expressed opposition to it, since supporting it (correct me if I'm wrong).

And as I look up through the history, I don't actually even see any PRs, so it would seem to be that no one has actually offered to step up and do the work here.

I'm curious if all of this was your understanding of the situation coming in. Perhaps it wasn't, and without a complete frame of reference, you wandered into the middle of the conversation wanting to know... why Nikita was being such a humbug.

But if this was your understanding of the situation, then that would seem to imply that your specific contention with @tonsky is that he has not dedicated his labor to an issue which is not a priority to him, but is to you and your "work products".

And if this is the situation, then may I say: What on Earth makes you think he owes you his labor? Like seriously. Have you donated to support his open source work? Have you submitted PRs? What on Earth would make you think that if I release OSS for free into the world, that I owe anyone in particular my time? If I give you my car for free, am I on the hook for oil changes? Like; Seriously!? How does this make any sense to anyone!?

Hopefully then, you you just misunderstood. But either way, your resorting to personal attacks on people respectfully setting personal boundaries is entirely inappropriate, and I think stands for itself:

Both of you are exhibiting extreme arrogance... ...his arrogant, entitled, self-appointed arbitration...

Somehow I don't think this is "the spirit of Lisp", so please leave whatever it is at the door.

Thanks

jleonard-r7 commented 2 years ago

@metasoarous my understanding was merely what's contained on the current page on the topic. And I see no support for the issue here and in fact the pattern of excuse making as I mentioned previously.

It seems that you read a lot into this statement of mine:

But I see that you don’t want to do any more than the minimum here.

So, I guess you're going to require me to lay out one by one your incorrect meta-assumptions (even leaving aside the incorrect technical assumptions about my particular project):

Where did I speculate about any of the reasons for the author showing a pattern of preferring minimal changes to the project? Where did I demand that he personally "do the work" to implement this feature? How do you know that I wouldn't have offered a PR? Isn't the first step to spending the time making a PR to see if it would be accepted or not?

Look, this lecture about labor and how open source works is quite frankly condescending and rather presumptuous on your part. I'm well aware and once again I did not demand anyone's labor.

And you got all of that from these 15 words: "But I see that you don’t want to do any more than the minimum here."

I think someone should try practicing a bit more charitable reading . I can offer 7 valid reasons someone might prefer the minimum (and none of them involve "laziness" -- your word, not mine).

If technical, clinical communication (free of emotion and personal attack) is not easy for you, this may be a tough industry for you in the long haul (just a word to the wise).

augustl commented 2 years ago

Can you two exchange e-mails and take the discussion elsewhere, please? :)

jleonard-r7 commented 2 years ago

I'm gonna just go ahead and block him. I think that would solve the problem.

tonsky commented 2 years ago

Sorry folks I was away and didn’t notice this at first.

@jleonard-r7 I will not allow you to talk this way to myself, my friends and my contributors. Technical discussion is not a place to attack people personalities or even discuss people. We discuss ideas here. Full stop.