ssbc / ssb-db2

A new database for secure-scuttlebutt
47 stars 8 forks source link

SSB-DB2

SSB-DB2 is a new database for secure-scuttlebutt, it is meant as a replacement for ssb-db. The main reason for creating a new database is to be able to rework some of the existing decisions without having to be 100% backwards compatible. The main reasons are:

Over time, this database received more features than ssb-db, and now supports:

SSB-DB2 is a secret-stack plugin that registers itself in the db namespace.

By default SSB-DB2 only loads a base index (indexes/base), this index includes the basic functionality for getting messages from the log and for doing EBT.

By default the database is stored in ssb/db2/log.bipf, leveldb indexes are stored in ssb/db2/indexes/, and jitdb indexes in ssb/db2/jit.

🎥 Watch a presentation about this new database.

Read the developer guide

Installation

 SecretStack({appKey: require('ssb-caps').shs})
   .use(require('ssb-master'))
+  .use(require('ssb-db2'))
   .use(require('ssb-conn'))
   .use(require('ssb-blobs'))
   .call(null, config)

Usage

To create and publish a new message, you can do:

const SecretStack = require('secret-stack')
const caps = require('ssb-caps')

const sbot = SecretStack({ caps })
  .use(require('ssb-db2'))
  .call(null, { path: './' })

sbot.db.create({ content: { type: 'post', text: 'hello!' } }, (err, msg) => {
  // A new message is now published on the log, with the contents above.
  console.log(msg)
  /*
  {
    key,
    value: {
      previous: null,
      sequence: 1,
      author,
      timestamp: 1633715006539,
      hash: 'sha256',
      content: { type: 'post', text: 'hello!' },
      signature,
    },
    timestamp: 1633715006540
  }
  */
})

To get the post messages of a specific author, you can do:

const SecretStack = require('secret-stack')
const caps = require('ssb-caps')
const { where, and, type, author, toCallback } = require('ssb-db2/operators')

const sbot = SecretStack({ caps })
  .use(require('ssb-db2'))
  .call(null, { path: './' })

sbot.db.query(
  where(
    and(
      type('post'),
      author('@6CAxOI3f+LUOVrbAl0IemqiS7ATpQvr9Mdw9LC4+Uv0=.ed25519')
    )
  ),
  toCallback((err, msgs) => {
    console.log(
      'There are ' + msgs.length + ' messages of type "post" from arj'
    )
    sbot.close()
  })
)

To get post messages that mention Alice, you can do:

const SecretStack = require('secret-stack')
const caps = require('ssb-caps')
const {where, and, type, mentions, toCallback} = require('ssb-db2/operators')

const sbot = SecretStack({ caps })
  .use(require('ssb-db2'))
  .call(null, { path: './' })

sbot.db.query(
  where(and(type('post'), mentions(alice.id)))),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
    sbot.close()
  })
)

Leveldb plugins

The queries you've seen above use JITDB, but there are some queries that cannot rely on JITDB alone, and we need to depend on Leveldb. This section shows some example leveldb indexes, explains when you need leveldb, and how to make your own leveldb plugin in ssb-db2.

Full-mentions

An extra index plugin that is commonly needed in SSB communities is the full-mentions index. It has one method: getMessagesByMention.

Although this accomplishes the same as the previous mentions() example, this plugin is meant as an example for application developers to write their own plugins if the functionality of JITDB is not enough. JITDB is good for indexing specific values, like mentions(alice.id) which gets its own dedidated JITDB index for alice.id. But when querying mentions of several feeds or several messages, this creates many indexes, so a specialized index makes more sense.

What full-mentions does is index all possible mentionable items at once, using Leveldb instead of JITDB. You can include it and use it like this:

const SecretStack = require('secret-stack')
const caps = require('ssb-caps')
const {where, and, type, toCallback} = require('ssb-db2/operators')

const sbot = SecretStack({ caps })
  .use(require('ssb-db2'))
  .use(require('ssb-db2/full-mentions')) // include index
  .call(null, { path: './' })

const {fullMentions} = sbot.db.operators

sbot.db.query(
  where(and(type('post'), fullMentions(alice.id)))),
  toCallback((err, msgs) => {
    console.log('There are ' + msgs.length + ' messages')
    sbot.close()
  })
)

Your own leveldb index plugin

It's wise to use JITDB when:

  1. You want the query output to be the msg itself, not state derived from msgs
  2. You want the query output ordered by timestamp (either descending or ascending)

There are some cases where the assumptions above are not met. For instance, with abouts, we often want to aggregate all type: "about" msgs and return all recent values for each field (name, image, description, etc). So assumption number 1 does not apply.

In that case, you can make a leveldb index for ssb-db2, by creating a class that extends the class at require('ssb-db2/indexes/plugin'), like this:

const Plugin = require('ssb-db2/indexes/plugin')

// This is a secret-stack plugin
exports.init = function (sbot, config) {
  class MyIndex extends Plugin {
    constructor(log, dir, configDb2) {
      //    log, dir, name, version, keyEncoding, valueEncoding, configDb2
      super(log, dir, 'myindex', 1, 'utf8', 'json', configDb2)
    }

    processRecord(record, seq) {
      const buf = record.value // this is a BIPF buffer, directly from the log
      // ...
      // Use BIPF seeking functions to decode some fields
      // ...
      this.batch.push({
        type: 'put',
        key: key, // some utf8 string here (see keyEncoding in the constructor)
        value: value, // some object here (see valueEncoding in the constructor)
      })
    }

    myOwnMethodToGetStuff(key, cb) {
      this.level.get(key, cb)
    }
  }

  sbot.db.registerIndex(MyIndex)
}

There are three parts you'll always need:

To call your custom methods, you'll have to pick them like this:

sbot.db.getIndex('myindex').myOwnMethodToGetStuff()

Or you can wrap that in a secret-stack plugin (in the example above, exports.init should return an object with the API functions).

There are other optional methods you can implement in the Plugin subclass:

Compatibility plugins

SSB DB2 includes a couple of plugins for backwards compatibility, including legacy replication, ebt and publish. They can be loaded as:

const SecretStack = require('secret-stack')

const sbot = SecretStack({ caps })
  .use(require('ssb-db2'))
  .use(require('ssb-db2/compat')) // include all compatibility plugins
  .call(null, {})

or specifically:

const SecretStack = require('secret-stack')

const sbot = SecretStack({ caps })
  .use(require('ssb-db2'))
  .use(require('ssb-db2/compat/db')) // basic db compatibility
  .use(require('ssb-db2/compat/log-stream')) // legacy replication
  .use(require('ssb-db2/compat/history-stream')) // legacy replication
  .use(require('ssb-db2/compat/ebt')) // ebt db helpers
  .use(require('ssb-db2/compat/publish')) // publish() function like in ssb-db
  .use(require('ssb-db2/compat/post')) // post() obv like in ssb-db
  .call(null, {})

Secret-stack modules using ssb-db2

The following is a list of modules that works well with ssb-db2:

Migrating from ssb-db

The log used underneath ssb-db2 is different than that one in ssb-db, this means we need to scan over the old log and copy all messages onto the new log, if you wish to use ssb-db2 to make queries.

⚠️ Warning: please read the following instructions about using two logs and carefully apply them to avoid forking feeds into an irrecoverable state.

Preventing forking feeds

The log is the source of truth in SSB, and now with ssb-db2, we introduce a new log alongside the previous one. One of them, not both has to be considered the source of truth.

While the old log exists, it will be continously migrated to the new log, and ssb-db2 forbids you to use its database-writing APIs such as add(), publish(), del() and so forth, to prevent the two logs from diverging into inconsistent states. The old log will remain the source of truth and the new log will just mirror it.

If you want to switch the source of truth to be the new log, we must delete the old log, after it has been fully migrated. Only then can you use database-writing APIs such as publish(). To delete the old log, one method is to use the config dangerouslyKillFlumeWhenMigrated. Set it to true only when you are absolutely sure that no other app will attempt to read/write to ~/.ssb/flume/log.offset or wherever the old log lives. It will delete the entire flume folder once migration has completed writing the messages to the new log. From that point onwards, using APIs such as publish() will succeed to append messages to the new log.

Triggering migration

ssb-db2 comes with migration methods built-in, you can enable them (they are off by default!) in your config file (or object):

const path = require('path')
const SecretStack = require('secret-stack')
const ssbKeys = require('ssb-keys')
const keys = ssbKeys.loadOrCreateSync(path.join(__dirname, 'secret'))

const config = {
  keys: keys,
  db2: {
    automigrate: true,
  },
}

const sbot = SecretStack({ caps })
  .use(require('ssb-db2'))
  .use(require('ssb-db2/compat'))
  .call(null, config)

The above script will initiate migration as soon as the plugins are loaded. If you wish the manually dictate when the migration starts, don't use the automigrate config above, instead, call the migrate.start() method yourself:

sbot.db.migrate.start()

Note, it is acceptable to load both ssb-db and ssb-db2 plugins, the system will still function correctly and migrate correctly:

const sbot = SecretStack({ caps })
  .use(require('ssb-db'))
  .use(require('ssb-db2'))
  .use(require('ssb-db2/compat'))
  .call(null, config)

Migrating without including ssb-db2

Because ssb-db2 also begins indexing basic metadata once it's included as a plugin, this may cost more (precious) CPU time. If you are not yet using db2 APIs but would like to migrate the log anyway, in preparation for later activating db2, then you can include only the migration plugin, like this:

const sbot = SecretStack({ appKey: caps.shs })
  .use(require('ssb-db2/migrate'))
  .call(null, config)

Note that the start behavior is the same: you can either start it automatically using config.db2.automigrate or manually like this:

sbot.db2migrate.start()

Methods

create(opts, cb)

Method for creating and publishing a new message on the log. The opts allows you to fully customize how this message will be written, including which feed format is used, which keys are signing/authoring the message, what encryption format is used and so forth.

The opts must be an object and should contain the following keys:

The callback cb is called when the message has been published. cb(err) if published failed with an error err, and cb(null, kvt) if it was successfully published, where kvt is a JavaScript object with the shape {key, value, timestamp} exactly representing the message written to the database.

get(msgId, cb)

Get a particular message value by message id.

getMsg(msgId, cb)

Get a particular message including key by message id.

query(...operators)

Flexible API to get messages from the database based on various criteria determined by the operators. There are usually two parts in this list of operators:

See jitdb operators and operators/index.js for a complete list of supported operators.

reindexed()

A pull-stream source of newly decrypted reindexed values returned as full messages. Calling reindexEncrypted after adding a new box2 key will trigger new values in this stream.

This api can be combined with where to receive messages of a particular type no matter if they are existing, newly added or decrypted values.

Example of getting existing post messages together with newly decrypted ones:

const pull = require('pull-stream')
const cat = require('pull-cat')

pull(
  cat([
    sbot2.db.query(where(type('post')), toPullStream()),
    pull(
      sbot2.db.reindexed(),
      pull.filter((msg) => {
        return msg.value.content.type === 'post'
      })
    ),
  ]),
  pull.drain((result) => {
    console.log('got a new post', result.value)
  })
)

add(nativeMsg, cb)

Validate and add a message to the database. The callback will the (possible) error in the 1st argument, and in the 2nd argument the stored message in "KVT" shape, i.e. {key, value, timestamp}. The nativeMsg is assumed to belong to a feed format that is currently installed, such as ssb-classic.

Alternatively: add(nativeMsg, opts, cb) where opts.encoding and opts.feedFormat can be specified. (Read above, in the create method, for details about these opts)

addOOO(nativeMsg, cb)

Validate and a message to the database, but validation will not check the previous. Useful for partial replication.

Callback arguments are (err, kvt), similar to the callback in the add method.

Alternatively: addOOO(nativeMsg, opts, cb) where opts.encoding and opts.feedFormat can be specified. (Read above, in the create method, for details about these opts)

addOOOBatch(nativeMsgs, cb)

Similar to addOOO, but you can pass an array of many messages. If the author is not yet known, the message is validated without checking if the previous link is correct, otherwise normal validation is performed. This makes it possible to use for partial replication to add all contact messages from a feed.

Callback arguments are (err, kvts), where kvts is an array of "KVT" matching each of the given nativeMsgs.

Alternatively: addOOOBatch(nativeMsgs, opts, cb) where opts.encoding and opts.feedFormat can be specified. (Read above, in the create method, for details about these opts)

addTransaction(nativeMsgs, oooNativeMsgs, cb)

Similar to addOOOBatch, except you pass in an array of nativeMsgs that will be validated in order and an array of oooNativeMsgs that will be validated similar to addOOOBatch. Finally all the messages are added to the database in such a way that either all of them are written to disc or none of them are.

Callback arguments are (err, kvts), similar to the callback in the addOOOBatch method.

Alternatively: addTransaction(nativeMsgs, oooNativeMsgs, opts, cb) where opts.encoding and opts.feedFormat can be specified. (Read above, in the create method, for details about these opts)

del(msgId, cb)

Delete a specific message given the message ID msgId from the database. :warning: Please note that this will break replication for anything trying to get that message, like createHistoryStream for the author or EBT. Because of this, it is not recommended to delete message with this method unless you know exactly what you are doing.

deleteFeed(feedId, cb)

Delete all messages of a specific feedId. Compared to del this method is safe to use.

onMsgAdded(cb)

Subscribe to know when a message is added to the database. The cb will be called as soon as a message is successfully persisted to the log, with one argument, event, which contains:

onMsgAdded itself is an obz, so the latest message added to the database can also be read using ssb.db.onMsgAdded.value.

getStatus

Gets the current db status, either as a synchronous function or observeable. This is similar to db.status in ssb-db. The value contains the following information:

Example:

// synchronous
const status = ssb.db2.getStatus()

// observeable
const listener = (status) => console.log(status.progress)
const unsubscribe = ssb.db2.getStatus(listener)

// later
unsubscribe() // unsubscribes the listener from status updates

reindexEncrypted(cb)

This function is useful in ssb-box2 where box2 keys can be added at runtime and that changes what messages can be decrypted. Calling this function is needed after adding a new key. The function can be called multiple times safely.

The decrypted values from this can be consumed using reindexed.

logStats(cb)

Use async-append-only-log's stats method to get information on how many bytes are used by messages in the log, and how many bytes are zero-filled.

prepare(operation, cb)

Use JITDB's prepare method to warm up a JIT index.

onDrain(indexName?, cb)

Waits for the index with name indexName to be in sync with the main log and then call cb with no arguments. If indexName is not provided, the base index will be used.

The reason we do it this way is that indexes are updated asynchronously in order to not block message writing.

compact(cb)

Compacts the log (filling in the blanks left by deleted messages and optimizing space) and then rebuilds indexes.

installFeedFormat(feedFormat)

If feedFormat conforms to the ssb-feed-format spec, then this method will install the feedFormat in this database instance, meaning that you can create messages for that feed format using the create method.

installEncryptionFormat(encryptionFormat)

If encryptionFormat conforms to the ssb-encryption-format spec, then this method will install the encryptionFormat in this database instance, meaning that you can now encrypt and decrypt messages using that encryption format.

reset(cb)

Force all indexese to be rebuilt. Use this as a last resort if you suspect that the indexes are corrupted.

Configuration

You can use ssb-config parameters to configure some aspects of ssb-db2:

const config = {
  keys: keys,
  db2: {
    /**
     * Start the migration plugin automatically as soon as possible.
     * Default: false
     */
    automigrate: true,

    /**
     * If the migration plugin is used, then when migration has completed, we
     * will remove the entire `~/.ssb/flume` directory, including the log.
     *
     * As the name indicates, this is dangerous, because if there are other apps
     * that still use `~/.ssb/flume`, they will see an empty log and progress to
     * write on that empty log using the `~/.ssb/secret` and this will very
     * likely fork the feed in comparison to new posts on the new log. Only use
     * this when you know the risks and you know that only the new log will be
     * written.
     * Default: false
     */
    dangerouslyKillFlumeWhenMigrated: false,

    /**
     * If the environment that db2 runs in has access to a regular disk to write to. Normally detected automatically but in environments like electron you might have to set this manually.
     * Default in node: true, default in a browser: false
     */
    hasDiskAccess: true

    /**
     * Only try to decrypt box1 messages created after this date
     * Default: null
     */
    startDecryptBox1: '2022-03-25',

    /**
     * A throttle interval (measured in milliseconds) to control how often
     * should messages given to `sbot.add` be flushed in batches.
     * Default: 250
     */
    addBatchThrottle: 250,

    /**
     * An upper limit on the CPU load that ssb-db2 can use while indexing
     * and scanning. `85` means "ssb-db2 will only index when CPU load is at
     * 85% or lower".
     * Default: Infinity
     */
    maxCpu: 85,

    /** This applies only if `maxCpu` is defined.
     * See `maxPause` in the module `too-hot`, for its definition.
     * Default: 300
     */
    maxCpuMaxPause: 180,

    /** This applies only if `maxCpu` is defined.
     * See `wait` in the module `too-hot`, for its definition.
     * Default: 90
     */
    maxCpuWait: 90,

    flushDebounce: 250,

    writeTimeout: 250,
  },
}

Operators

The following operators are included by default, see operators/index.js for how they are implemented. Also exposed are all jitdb operators