This pattern is then used in query resolution (conjunctive joins, disjunctive joins, filters, minus, etc).
Proposal
Extend the metadata to include column types: local or global
The above example would form a global type. This means that the data in the binding makes sense in a global context. Hence, the new representation would be:
These ID values are what is stored in the triples.
NB: These examples are definitive because each of the bound values can be encapsulated into those IDs. (Values are encapsulated when they are negative). However, larger values (such as a long string) will be mapped to their data-pool IDs (which are positive numbers).
Desired Outcome
This will allow data to go through join operations without needing to be globalized, thereby speeding queries from storage significantly.
Describing a type for each column individually will allow queries to continue to join data from difference sources. This means that the benefit will only apply when joins are occurring for bindings with matching columns that have come from the same storage.
Concerns
Joins
Each type of "local" storage will need its own type value, since data from different data-pools will have different IDs, although encapsulation is universal across stores. The resulting comparison operations are:
left
right
join operation
global
global
left / right
local1
global
(globalize left)/ right
local1
local1
left / right
local1
local2
(globalize left) / (globalize right)
local1
local2
iff (local1 < 0) and (local2 < 0) then: left / right
Note that the in-memory store is always in globalized form.
Types for a Binding or for all Bindings
Incoming data always starts in local form. There may be edge cases (for instance, in filtering) where it would be more efficient to only convert a column for some bindings. However, this will introduce complexity that may slow the querying down. Globalizing already happens automatically, so unnecessary globalization is no slower than the current system.
Filters and Bindings
Both of these operations rely on global values from their source data. This requires globalization.
Mixed Operations
Because columns may be joined after filtering, it will be more efficient in many cases to have the local value still. So globalizing a column should result in a new column, not a replacement column.
Projection
The final projection operation currently looks for column names. Under this new approach projection must now:
Look for the column by name.
If the column is local, then record this and continue looking.
If a global column is found:
Select the global column
Else:
Map the column to global and select the new column.
Context
Right now bindings are in the format:
^{:cols [String]} [[Object]]
Meaning that it's a seq of seqs of objects, with metadata that contains the names of the columns.
An example might be a binding of names and ages:
This pattern is then used in query resolution (conjunctive joins, disjunctive joins, filters, minus, etc).
Proposal
Extend the metadata to include column types: local or global The above example would form a global type. This means that the data in the binding makes sense in a global context. Hence, the new representation would be:
A local example of the same thing is:
These ID values are what is stored in the triples.
NB: These examples are definitive because each of the bound values can be encapsulated into those IDs. (Values are encapsulated when they are negative). However, larger values (such as a long string) will be mapped to their data-pool IDs (which are positive numbers).
Desired Outcome
This will allow data to go through join operations without needing to be globalized, thereby speeding queries from storage significantly.
Describing a type for each column individually will allow queries to continue to join data from difference sources. This means that the benefit will only apply when joins are occurring for bindings with matching columns that have come from the same storage.
Concerns
Joins
(globalize left)
/ right(globalize left)
/(globalize right)
Note that the in-memory store is always in globalized form.
Types for a Binding or for all Bindings
Incoming data always starts in local form. There may be edge cases (for instance, in filtering) where it would be more efficient to only convert a column for some bindings. However, this will introduce complexity that may slow the querying down. Globalizing already happens automatically, so unnecessary globalization is no slower than the current system.
Filters and Bindings
Both of these operations rely on global values from their source data. This requires globalization.
Mixed Operations
Because columns may be joined after filtering, it will be more efficient in many cases to have the local value still. So globalizing a column should result in a new column, not a replacement column.
Projection
The final projection operation currently looks for column names. Under this new approach projection must now: