qpdb / mentat

A persistent, relational store inspired by Datomic and DataScript.
https://mentat.rs/
Apache License 2.0
52 stars 2 forks source link

Project Mentat

Build Status

Project Mentat is a persistent, embedded knowledge base. It draws heavily on DataScript and Datomic.

This project was started by Mozilla, but is no longer being developed or actively maintained by them. Their repository was marked read-only, this fork is an attempt to revive and continue that interesting work. We owe the team at Mozilla more than words can express for inspiring us all and for this project in particular.

Thank you.

Documentation


Motivation

Mentat is a flexible relational (not key-value, not document-oriented) store that makes it easy to describe, grow, and reuse your domain schema.

By abstracting away the storage schema, and by exposing change listeners outside the database (not via triggers), we hope to make domain schemas stable, and allow both the data store itself and embedding applications to use better architectures, meeting performance goals in a way that allows future evolution.

Data storage is hard

We've observed that data storage is a particular area of difficulty for software development teams:

Comparison to DataScript

DataScript asks the question: "What if creating a database were as cheap as creating a Hashmap?"

Mentat is not interested in that. Instead, it's focused on persistence and performance, with very little interest in immutable databases/databases as values or throwaway use.

One might say that Mentat's question is: "What if a database could store arbitrary relations, for arbitrary consumers, without them having to coordinate an up-front storage-level schema?"

Consider this a practical approach to facts, to knowledge its storage and access, much like SQLite is a practical RDBMS.

(Note that domain-level schemas are very valuable.)

Another possible question would be: "What if we could bake some of the concepts of CQRS and event sourcing into a persistent relational store, such that the transaction log itself were of value to queries?"

Some thought has been given to how databases as values — long-term references to a snapshot of the store at an instant in time — could work in this model. It's not impossible; it simply has different performance characteristics.

Just like DataScript, Mentat speaks Datalog for querying and takes additions and retractions as input to a transaction.

Unlike DataScript, Mentat exposes free-text indexing, thanks to SQLite/FTS.

Comparison to Datomic

Datomic is a server-side, enterprise-grade data storage system. Datomic has a beautiful conceptual model. It's intended to be backed by a storage cluster, in which it keeps index chunks forever. Index chunks are replicated to peers, allowing it to run queries at the edges. Writes are serialized through a transactor.

Many of these design decisions are inapplicable to deployed desktop software; indeed, the use of multiple JVM processes makes Datomic's use in a small desktop app, or a mobile device, prohibitive.

Comparison to SQLite

SQLite is a traditional SQL database in most respects: schemas conflate semantic, structural, and datatype concerns, as described above; the main interface with the database is human-first textual queries; sparse and graph-structured data are 'unnatural', if not always inefficient; experimenting with and evolving data models are error-prone and complicated activities; and so on.

Mentat aims to offer many of the advantages of SQLite — single-file use, embeddability, and good performance — while building a more relaxed, reusable, and expressive data model on top.


Contributing

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

See CONTRIBUTING.md for further notes.

This project is very new, so we'll probably revise these guidelines. Please comment on an issue before putting significant effort in if you'd like to contribute.


Building

You first need to clone the project. To build and test the project, we are using Cargo.

To build all of the crates in the project use:

cargo build

To run tests use:

# Run tests for everything.
cargo test --all

# Run tests for just the query-algebrizer folder (specify the crate, not the folder),
# printing debug output.
cargo test -p mentat_query_algebrizer -- --nocapture

For most cargo commands you can pass the -p argument to run the command just on that package. So, cargo build -p mentat_query_algebrizer will build just the "query-algebrizer" folder.

What are all of these crates?

We use multiple sub-crates for Mentat for four reasons:

  1. To improve incremental build times.
  2. To encourage encapsulation; writing extern crate feels worse than just use mod.
  3. To simplify the creation of targets that don't use certain features: e.g., a build with no syncing, or with no query system.
  4. To allow for reuse (e.g., the EDN parser is essentially a separate library).

So what are they?

Building blocks

edn

Our EDN parser. It uses rust-peg to parse EDN, which is Clojure/Datomic's richer alternative to JSON. edn's dependencies are all either for representing rich values (chrono, uuid, ordered-float) or for parsing (serde, peg).

In addition, this crate turns a stream of EDN values into a representation suitable to be transacted.

mentat_core

This is the lowest-level Mentat crate. It collects together the following things:

Types

mentat_query

This crate defines the structs and enums that are the output of the query parser and used by the translator and algebrizer. SrcVar, NonIntegerConstant, FnArg… these all live here.

mentat_query_sql

Similarly, this crate defines an abstract representation of a SQL query as understood by Mentat. This bridges between Mentat's types (e.g., TypedValue) and SQL concepts (ColumnOrExpression, GroupBy). It's produced by the algebrizer and consumed by the translator.

Query processing

mentat_query_algebrizer

This is the biggest piece of the query engine. It takes a parsed query, which at this point is independent of a database, and combines it with the current state of the schema and data. This involves translating keywords into attributes, abstract values into concrete values with a known type, and producing an AlgebraicQuery, which is a representation of how a query's Datalog semantics can be satisfied as SQL table joins and constraints over Mentat's SQL schema. An algebrized query is tightly coupled with both the disk schema and the vocabulary present in the store when the work is done.

mentat_query_projector

A Datalog query projects some of the variables in the query into data structures in the output. This crate takes an algebrized query and a projection list and figures out how to get values out of the running SQL query and into the right format for the consumer.

mentat_query_translator

This crate works with all of the above to turn the output of the algebrizer and projector into the data structures defined in mentat_query_sql.

mentat_sql

This simple crate turns those data structures into SQL text and bindings that can later be executed by rusqlite.

The data layer: mentat_db

This is a big one: it implements the core storage logic on top of SQLite. This crate is responsible for bootstrapping new databases, transacting new data, maintaining the attribute cache, and building and updating in-memory representations of the storage schema.

The main crate

The top-level main crate of Mentat assembles these component crates into something useful. It wraps up a connection to a database file and the associated metadata into a Store, and encapsulates an in-progress transaction (InProgress). It provides modules for programmatically writing (entity_builder.rs) and managing vocabulary (vocabulary.rs).

Syncing

Sync code lives, for referential reasons, in a crate named tolstoy. This code is a work in progress; current state is a proof-of-concept implementation which largely relies on the internal transactor to make progress in most cases and comes with a basic support for timelines. See Tolstoy's documentation for details.

The command-line interface

This is under tools/cli. It's essentially an external consumer of the main mentat crate. This code is ugly, but it mostly works.


SQLite dependencies

Mentat uses partial indices, which are available in SQLite 3.8.0 and higher. It relies on correlation between aggregate and non-aggregate columns in the output, which was added in SQLite 3.7.11.

It also uses FTS4, which is a compile time option.

By default, Mentat specifies the "bundled" feature for rusqlite, which uses a relatively recent version of SQLite. If you want to link against the system version of SQLite, omit "bundled_sqlite3" from Mentat's features.

[dependencies.mentat]
version = "0.6"
# System sqlite is known to be new.
default-features = false

License

Project Mentat is currently licensed under the Apache License v2.0. See the LICENSE file for details.