quoll / asami

A flexible graph store, written in Clojure
Eclipse Public License 1.0
329 stars 10 forks source link

Metadata on Bindings #15

Open quoll opened 1 year ago

quoll commented 1 year ago

Context

Right now bindings are in the format: ^{:cols [String]} [[Object]]

Meaning that it's a seq of seqs of objects, with metadata that contains the names of the columns.

An example might be a binding of names and ages:

^{:cols ["?name" "?age"]} [["Alice" 22] ["Bob" 21]]

This pattern is then used in query resolution (conjunctive joins, disjunctive joins, filters, minus, etc).

Proposal

Extend the metadata to include column types: local or global The above example would form a global type. This means that the data in the binding makes sense in a global context. Hence, the new representation would be:

^{:cols ["?name" "?age"] :types [:global :global]} [["Alice" 22] ["Bob" 21]]

A local example of the same thing is:

^{:cols ["?name" "?age"] :types [:local1 :local1]} [[-1927139965642932224 -9223372036854775786] [-2070970411939528704 -9223372036854775787]]

These ID values are what is stored in the triples.

NB: These examples are definitive because each of the bound values can be encapsulated into those IDs. (Values are encapsulated when they are negative). However, larger values (such as a long string) will be mapped to their data-pool IDs (which are positive numbers).

Desired Outcome

This will allow data to go through join operations without needing to be globalized, thereby speeding queries from storage significantly.

Describing a type for each column individually will allow queries to continue to join data from difference sources. This means that the benefit will only apply when joins are occurring for bindings with matching columns that have come from the same storage.

Concerns

Joins

Each type of "local" storage will need its own type value, since data from different data-pools will have different IDs, although encapsulation is universal across stores. The resulting comparison operations are: left right join operation
global global left / right
local1 global (globalize left)/ right
local1 local1 left / right
local1 local2 (globalize left) / (globalize right)
local1 local2 iff (local1 < 0) and (local2 < 0) then: left / right

Note that the in-memory store is always in globalized form.

Types for a Binding or for all Bindings

Incoming data always starts in local form. There may be edge cases (for instance, in filtering) where it would be more efficient to only convert a column for some bindings. However, this will introduce complexity that may slow the querying down. Globalizing already happens automatically, so unnecessary globalization is no slower than the current system.

Filters and Bindings

Both of these operations rely on global values from their source data. This requires globalization.

Mixed Operations

Because columns may be joined after filtering, it will be more efficient in many cases to have the local value still. So globalizing a column should result in a new column, not a replacement column.

Projection

The final projection operation currently looks for column names. Under this new approach projection must now: