tonsky / datascript

Immutable database and Datalog query engine for Clojure, ClojureScript and JS
Eclipse Public License 1.0
5.5k stars 308 forks source link

Optimize entities #227

Open rauhs opened 7 years ago

rauhs commented 7 years ago

Also partially dicussed on Slack:

Currently an attribute lookup on an entity does a search into the btset and then caches the result. This is inefficient since the all the attribute values are right next to each other in the eavt index. Getting them all at once and storing them in a cache makes sense, but has huge problem:

  1. What if the entity has a many ref with many many references?
  2. What if the entity has MANY attributes

I think 2) is unlikely a use-case and can be ignored. However 1) is an issue.

Ideas:

thedavidmeister commented 7 years ago

👍

rauhs commented 6 years ago

Just want to continue dumping my thoughts: I ended up implementing this for my datascript fork (which adds a few other features) in a slightly different way:

Instead of bounded counting an Iter (which still walks over 1-2 btset leaves) and then again iterating over it when the bounded count was <=20, I now just iterate over the Iter right away but stop after 20 attributes (so as to avoid realizing an entity which has many attributes or --more commonly-- an entity that has a ref-many with many references).

So entity/lookup-entity is now roughly:

  1. If in cache -> return
  2. If not. (cond touched ...) a) if touched is nil: This is the very first attribute access, do the (db/-search ...) and start reducing over the Iter. Add each attribute to the cache but STOP once we reach 20 attributes or the Iter is reduced over. If we hit the limit of 20: We now have to see if we were in the middle of adding a mutli-ref attribute, since we have to remove these again from the cache if we were. Set the touched to either true (didnt hit the limit) or false (we did hit the limit and there was more in the btset Iter). b) If touched is true => not found c) If touched is false => get the attribute with an Iter and add to cache.

For the majority of Entities, this means we only once search the btset and get all the attributes in the cache quickly. For entities with many attributes this is still fast since we're still at near-constant time here.