readingdb is a time-series database designed for efficiency and speed.

Time series data is any data which is a series of (time, sequence, value) points. ReadingDB buckets and compresses your data using a delta encoding and zlib, and then writes this into a bdb installation with a bdb index. It uses the bdb transaction manager for write-ahead logging so that the volume will not become corrupted.

To use it, first follow the instructions in INSTALL to build the database and the python interface module. The key objects in readingdb are "streams", which have an integer id. The readingdb python module lets you talk to the server process; traffic between the client and server is encoded using google protocol buffer definitions found in c6/pbuf.

A simple python script for inserting data would be:

-- import readingdb as rdb

specify default host/port

rdb.db_setup('localhost', 4242)

create a connection

db = rdb.db_open('localhost')

add data. the tuples are (timestamp, seqno, value)

rdb.db_add(db, 1, [(x, 0, x) for x in xrange(0, 100)])

read back the data we just wrote using the existing connection

the args are streamid, start_timestamp, end_timestamp

print rdb.db_query(1, 0, 100, conn=db)

close

rdb.db_close(db)

read back the data again using a connection pool this time. You can

specify a list of streamids to range-query multiple streams at once.

rdb.db_query([1], 0, 100)

ReadingDB supports efficient range querying and interation using the db_query, db_prev, and db_next operations; you can delete data with db_del.

As you can see, db_query can re-use an existing connection if desired. As of April 20, 2012, the result of querying the TSDB is returned as a list of numpy matrices. This is because this is a very memory-efficient data structure, and creating numpy matrices using the c API is a lot more efficient than later doing it in python.

If no connection is specified for db_query/db_prev/db_next, new connections will be opened to the host/port specified with db_setup. The most efficient way of downloading data from a large number of streams is to specify a list of streamids, which allows the client library to conduct multiple parallel downloads of the data. Using this approach we have observed readingdb easily saturating a 100Mb NIC.

Sketches

As of 0.7.0 (October, 2014), readingdb supports computing sketches over the data in order to allow for more interactive performance on very large data sets. As of this release, it supports precomputing min, max, mean, and count at 5-minute, 15-minute, and 1-hour resolutions.

By default, the behavor is the same as before. If sketches are enabled on reading-server (by starting it with the -r flag), it will stream a log of regions of streams which have new data to disk; these are the places where the sketches should be updated.

The sketches may be updated using the new reading-sketch program; when run, it reads in the log, computes new sketches for the regions of time which have changed, and exits. This should be called periodically (e.g., by cron); the debian package installes a disabled crontab for this purpose into /etc/cron.d/readingdb.

Clients may request sketches through the new sketch kwarg to rb_query. For instance:

load hourly minimia from stream number 2, over all time.

min_data = rdb.db_query([2], 0, int(time.time()), sketch=("min", 3600))[0]

The value should be a tuple of (sketchname, resolution (in seconds)). The server will raise an exception on the client if an invalid sketch is passed.

Note: older servers may instead return the underlying data. older client libraries will raise an exception since sketch is not a valid kwarg there.

Dependencies

For reading-server libdb4.8, libdb4.8-dev (berkeley database) libprotobuf, libprotobuf-dev (google protocol buffers) libprotoc6, libprotoc6-dev (c bindings for protobufs) zlib, zlib-dev (for compression) gcc, make, automake

For python bindings python, python-dev, python-numpy (python deps) libdb4.8-dev swig (interface generator)

stevedh / readingdb

readme