tfredrich / NoSchema

Store PoJos as schema-less documents in Cassandra. Manage multiple, denormalized "views" on those PoJos with UnitOfWork, automatically.
1 stars 0 forks source link

NoSchema

NoSchema is a Java library offering a document-oriented repository pattern for Cassandra storage. It eliminates the need for a predefined schema, thereby providing significant flexibility in domain modeling without necessitating table data migrations.

In NoSchema, only the keys that form an object's identifier are stored as individual columns. These keys are extracted from the entity at storage time. The object itself is serialized into a storage format through a pluggable serialization process and stored as a binary blob. Currently, NoSchema supports BSON (like MongoDB) and GSON (JSON) serialization formats.

Given that usage of materialized views and indexes in Cassandra are discouraged (at least in high-throughput scenarios), NoSchema introduces the concepts of a PrimaryTable, View, and Index. These are fully managed and encapsulated within a Repository pattern that implements UnitOfWork. This ensures that multiple, denormalized tables are written; one for each key format, eliminating complex coding, reducing time-to-market, increasing developer efficiency and software accuracy.

NoSchema simplifies the management of resource-oriented Plain Old Java Objects (PoJos) with multiple, denormalized views, and indexes. This is achieved through a straightforward Repository pattern, making it an ideal choice for developers building RESTful APIs to simplify their data storage in Cassandra while increasing functionality.

Features

Modules

The project is divided into four main modules:

  1. core: This module provides the base classes and interfaces such as Identifiable and Identifier.

  2. cassandra: This module provides the base Respository class for Cassandra database operations and also includes the UnitOfWork implementation, which is used by the Repository.

  3. bson-provider: This module provides the BsonObjectCodec class for BSON serialization and deserialization. The class implements the ObjectCodec interface and provides methods for serializing and deserializing objects to and from BSON format.

  4. gson-provider: This module provides the GsonObjectCodec class for Gson serialization and deserialization. The class implements the ObjectCodec interface and provides methods for serializing and deserializing objects to and from Gson format. The class is located in the com.strategicgains.noschema.gson package.

Getting Started

  1. One requirement for using NoSchema is that entities MUST implement the Identifiable interface which needs a getId() method returning an Identifier instance containing the primary identifier components for the entity.

  2. Choose a serialization provider: BSON (recommended) and GSON are built-in. But if you want to use something that's already in your project, implement the ObjectCodec interface.

  3. Override the CassandraNoSchemaRepository for each PoJo, defining the PrimaryTables and Views for the resource (See: Defining Keys, below).

  4. Override any methods needed in AbstractDocumentObserver to process entities before or after encoding or before or after storage. For example, encryption/decryption of the entity, event streaming for CUD operations, etc. Add your observer to the repository constructor using the .withObserver() DSL setter.

  5. Determine the UnitOfWork consistency level. The default is to use the capabilities of the Cassandra client driver to distribute and manage the multiple statements across the views as completely asynchronous operations. It is also possible to use LOGGED and UNLOGGED batch operations, but know that these are anti-patterns in Cassandra due to them likely being cross-partition writes and will impact performance.

Maven Configuration

For the recommended configuration using BSON as the serialization provider, there are two dependencies needed in the Maven pom.xml.

<dependency>
    <groupId>com.strategicgains.noschema</groupId>
    <artifactId>noschema-cassandra</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>com.strategicgains.noschema</groupId>
    <artifactId>noschema-bson-provider</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>

If GSON is desired instead:

<dependency>
    <groupId>com.strategicgains.noschema</groupId>
    <artifactId>noschema-cassandra</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>com.strategicgains.noschema</groupId>
    <artifactId>noschema-gson-provider</artifactId>
    <version>1.0.0-SNAPSHOT</version>
</dependency>

Usage

Here is a basic example of how to use NoSchema.

Note that this example doesn't override the default CassandraNoSchemaRepository but uses it raw for illustration purposes. You will likely want to override it to scope it to your own resources.:

public static void main(String[] args)
throws KeyDefinitionException, InvalidIdentifierException, DuplicateItemException, ItemNotFoundException
{
    CqlSession session = ... // create the CqlSession as usual.

    // Define a table and its views.
    PrimaryTable albumsTable = new PrimaryTable("sample_keyspace", "albums", "id:UUID unique")
        .withView("by_name", "(account.id as account_id:UUID), name:text unique");

    try
    {
        // Create a Repository instance.
        // This one uses the asyncronous UnitOfWork and BSON serialization.
        CassandraNoSchemaRepository<Album> albums = new CassandraNoSchemaRepository<>(session, albumsTable, UnitOfWorkType.ASYNC, new BsonObjectCodec());

        // this creates any missing underlying Cassandra tables (for both PrimaryTable and any Views)
        albums.ensureTables();

        // Create a new in-memory Album instance.
        UUID id = UUID.randomUUID();
        UUID accountId = UUID.randomUUID();
        Album album = new Album(accountId, id, "AC/DC", "Back In Black", ...);

        // Write the Album, writing both PrimaryTable and View at once, ensuring uniqueness of both the primary table and `by_name` view.
        Album written = albums.create(album);
        System.out.println(written.toString());

        // Read the album by its ID.
        Album read = albums.read(new Identifier(id));
        System.out.println(read.toString());

        // Read the album by account number and name.
        read = albums.read(ALBUMS_BY_NAME, new Identifier(accountId, "Back In Black"));
        System.out.println(read.toString());
    }
    finally
    {
        session.close();
    }
}

Defining Keys

NoSchema has its own key definition language. While it is very similar to Cassandra's Partition Key and Clustering Key concepts, it needs a type for each of those components. It also allows the mapping of PoJo property names to a column name that doesn't match. Additionally, NoSchema supports uniqueness enforcement while Cassandra does not.

Key definition strings are expected to follow a specific format, which is defined as follows:

key-definition ::= "(" partition-key ")" clustering-key modifier
partition-key ::= column-definition | column-definition "," column-definition
clustering-key ::= clustering-key-component | clustering-key "," clustering-key-component
modifier ::= "unique" | ε
clustering-key-component ::= "+" column-definition | "-" column-definition | column-definition
column-definition ::= name-type-pair | property-name " as " name-type-pair
name-type-pair ::= name ":" type
property-name ::= [a-zA-Z0-9_]+

In the above BNF-style diagram:

Examples:

(id:uuid)   // partition key only
((id:uuid) name:text unique)    // partition key of id, clustering key of name, with unique modifier.
((id:uuid, name:text), -created:timestamp, +age:int)    // partition key of id and name + clustering key of created and age, with sort order on each.

The class throws a KeyDefinitionException if the string is invalid. This can occur if the string is null or empty, if it contains too many parentheses, if a parenthesis is misplaced, or if the parentheses are unmatched. The class throws an InvalidIdentifierException if the entity is missing any property values required by the key definition while the identifier is being extracted from a PoJo at storage time.

Contributing

Contributions are welcome! Submit a pull request from your own clone of this GitHub repo.

License

This project is licensed under the terms of the Apache 2.0 license.