unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.79k stars 270 forks source link

Proposal: Make codebase format more relational #2545

Open ChrisPenner opened 3 years ago

ChrisPenner commented 3 years ago

This proposal makes a case for a an alteration to the sqlite codebase storage format.

Goal

The goal is to allow us to more easily load exactly the data we need, and no more, from the database exactly when we need it. Additional goals include:

TLDR;

We should store the localID mappings and branch dependency mappings as tables in sqlite rather than hiding them in blobs. We should upgrade our "dependents" indexes into first-class join-tables which track all dependencies.

Rationale

The goal is to have all relationships between entities in our codebase be tracked in a relationship that the database itself understands. This allows us to take full advantage of the relational nature of SQL when interacting with, syncing, and migrating our codebases.

Pros:

Cons:

Further Steps

We may also consider normalizing term & type references into local ID mappings similar to branches, and storing these mappings relationally as well, which extends the syncing benefits of branches to terms and types.

Some Proposed Schema Changes:

These are a WIP:

This relates branches to the terms and types it contains. By creating different indexes on this table you can find all the branches a given term belongs to, and which terms belong in a namespace.

CREATE TABLE branch_terms (
  branch_id          INTEGER NOT NULL CONSTRAINT branch_terms_fk1 REFERENCES object(id),
  local_id           INTEGER NOT NULL,
  name_id            INTEGER NOT NULL CONSTRAINT branch_terms_fk2 REFERENCES text(id),
  term_component_id  INTEGER     NULL CONSTRAINT branch_terms_fk3 REFERENCES object(id),
  term_component_pos INTEGER     NULL,
)

This relates branches to their children:

CREATE TABLE branch_children
  (
     branch_id     INTEGER NOT NULL CONSTRAINT branch_children_fk1 REFERENCES object(id),
     child_id      INTEGER NOT NULL CONSTRAINT branch_children_fk2 REFERENCES object(id),
     child_name_id INTEGER NOT NULL CONSTRAINT branch_children_fk3 REFERENCES text(id),
  ) 

This relates terms and types to each other. Indexes over this table allow you to find all terms which depend on a type, all terms which depend on a term, etc.

CREATE TABLE term_and_type_dependencies (
  dependent_object_id       INTEGER NOT NULL CONSTRAINT dependents_index_fk1 REFERENCES object(id),
  dependent_local_id        INTEGER NOT NULL,
  dependent_component_index INTEGER NOT NULL,
  dependent_constructor_id  INTEGER     NULL,

  -- dependency is identified by either a builtin text ID,
  dependency_builtin INTEGER NULL CONSTRAINT dependents_index_fk2 REFERENCES text(id),

  -- OR by an object ID, componentID, and optional constructor ID.
  dependency_object_id       INTEGER NULL CONSTRAINT dependents_index_fk3 REFERENCES object(id),
  dependency_component_index INTEGER NULL,
  dependency_constructor_id  INTEGER NULL,

);
aryairani commented 3 years ago

I didn't quite get the WIP schema, but this sounds great generally. We can try to sort out the details.

aryairani commented 2 years ago

Also see #2219.