share / sharedb-mongo

MongoDB database adapter for ShareDB
MIT License
152 stars 65 forks source link

sharedb-mongo

NPM Version Test Coverage Status

MongoDB database adapter for sharedb. This driver can be used both as a snapshot store and oplog.

Snapshots are stored where you'd expect (the named collection with _id=id). In addition, operations are stored in o_COLLECTION. For example, if you have a users collection, the operations are stored in o_users.

JSON document snapshots in sharedb-mongo are unwrapped so you can use mongo queries directly against JSON documents. (They just have some extra fields in the form of _v and _type). It is safe to query documents directly with the MongoDB driver or command line. Any read only mongo features, including find, aggregate, and map reduce are safe to perform concurrent with ShareDB.

However, you must always use ShareDB to edit documents. Never use the MongoDB driver or command line to directly modify any documents that ShareDB might create or edit. ShareDB must be used to properly persist operations together with snapshots.

Usage

sharedb-mongo uses the MongoDB NodeJS Driver, and it supports the same configuration options.

There are two ways to instantiate a sharedb-mongo wrapper:

  1. The simplest way is to invoke the module and pass in your mongo DB arguments as arguments to the module function. For example:

    ```javascript
    const db = require('sharedb-mongo')('mongodb://localhost:27017/test', {mongoOptions: {...}});
    const backend = new ShareDB({db});
    ```
  2. If you'd like to reuse a mongo db connection or handle mongo driver instantiation yourself, you can pass in a function that calls back with a mongo instance.

    ```javascript
    const mongodb = require('mongodb');
    const db = require('sharedb-mongo')({mongo: function(callback) {
      mongodb.connect('mongodb://localhost:27017/test', callback);
    }});
    const backend = new ShareDB({db});
    ```

Queries

In ShareDB, queries are represented as single JavaScript objects. But Mongo exposes methods on collections and cursors such as mapReduce, sort or count. These are encoded into ShareDBMongo's query object format through special $-prefixed keys that are interpreted and stripped out of the query before being passed into Mongo's find method.

Here are some examples:

MongoDB query code ShareDBMongo query object
coll.find({x: 1, y: {$ne: 2}}) {x: 1, y: {$ne: 2}}
coll.find({$or: [{x: 1}, {y: 1}]) {$or: [{x: 1}, {y: 1}]}}
coll.mapReduce({map: ..., reduce: ...}) {$mapReduce: {map: ..., reduce: ...}
coll.find({x: 1}).sort({y: -1}) {x: 1, $sort: {y: -1}}
coll.find().limit(5).count({applySkipLimit: true}) {x: 1, $limit: 5, $count: {applySkipLimit: true}}

Most of Mongo 3.2's collection and cursor methods are supported. Methods calls map to query properties whose key is the method name prefixed by $ and value is the argument passed to the method. $readPref is an exception -- it takes an object with mode and tagSet fields which map to the two arguments passed into the readPref method.

For a full list of supported collection and cursor methods, see collectionOperationsMap, cursorTransformsMap and cursorOperationsMap in index.js

getOps without strict linking

There is a getOpsWithoutStrictLinking flag, which can be set to true to speed up getOps under certain circumstances, but with potential risks to the integrity of the results. Read below for more detail.

Introduction

ShareDB has to deal with concurrency issues. In particular, here we discuss the issue of submitting multiple competing ops against a version of a document.

For example, if I have a version of a document at v1, and I simultaneously submit two ops (from different servers, say) against this snapshot, then we need to handle the fact that only one of these ops can be accepted as canonical and applied to the snapshot.

This issue is dealt with through optimistic locking. Even if you are only asking for a subset of the ops, under the default behaviour, getOps will fetch all the ops up to the current version.

Optimistic locking and linked ops

sharedb-mongo deals with its concurrency issue with multiple op submissions with optimistic locking. Here's an example of its behaviour:

In reality, sharedb-mongo attempts to clean up this failed op, but there's still the small chance that the server crashes before it can do so, meaning that we may have multiple ops lingering in the database with the same version.

Because some non-canonical ops may exist in the database, we cannot just perform a naive fetch of all the ops associated with a document, because it may return multiple ops with the same version (where one was successfully applied, and one was not).

In order to return a valid set of canonical ops, the optimistic locking has a notion of linked ops. That is, each op will point back to the op that it built on top of, and ultimately the current snapshot points to the op that committed it to the database.

Because of this, we can work backwards from the current snapshot, following the trail of op links all the way back to get a chain of canonical, valid, linked ops. This way, even if a spurious op exists in the database, no other op will point to it, and it will be correctly ignored.

This approach has a big down-side: it forces us to fetch all the ops up to the current version. This might be fine if you want all ops, or are fetching very recent ops, but can have a large impact on performance if you only want ops 1-10 of a 10,000 op document, because you actually have to fetch all the ops.

Dropping strict linking

In order to speed up the performance of getOps, you can set getOpsWithoutStrictLinking: true. This will attempt to fetch the bare minimum ops, whilst still trying to maintain op integrity.

The assumption that underpins this approach is that any op that exists with a unique combination of d (document ID) and v (version), is a valid op. In other words, it had no conflicts and can be considered canonical.

Consider a document with some ops, including some spurious, failed ops:

If I want to fetch ops v1-v3, then we:

In the case where a valid op cannot be determined, we still fall back to fetching all ops and working backwards from the current version.

Middlewares

Middlewares let you hook into the sharedb-mongo pipeline for certain actions. They are distinct from middleware in ShareDB as they are closer to the concrete calls that are made to MongoDB itself.

The original intent for middleware on sharedb-mongo is to support running in a sharded MongoDB cluster to satisfy the requirements on shard keys for versions 4.2 and greater of MongoDB. For more information see the MongoDB docs.

Usage

share.use(action, fn) Register a new middleware.

Limitations

Integrity

Attempting to infer a canonical op can be dangerous compared to simply following the valid op chain from the snapshot, which is - by definition - canonical.

This alternative behaviour should be safe, but should be used with caution, because we are attempting to infer a canonical op, which may have unforeseen corner cases that return an invalid set of ops.

This may be especially true if the ops are modified outside of sharedb-mongo (eg by setting a TTL, or manually updating them).

Recent ops

There are cases where this flag may slow down behaviour. In the case of attempting to fetch very recent ops, setting this flag may make extra database round-trips where fetching the snapshot would have been faster.

getOpsBulk and getOpsToSnapshot

This flag only applies to getOps, and not to the similar getOpsBulk and getOpsToSnapshot methods, whose performance will remain unchanged.

Error codes

Mongo errors are passed back directly. Additional error codes:

4100 -- Bad request - DB

5100 -- Internal error - DB