River View

Public Temporal Streaming Data Service Framework

River View is a Public Temporal Streaming Data Service Framework (yes, that's a mouthful!). It provides a pluggable interface for users to expose temporal data streams in a time-boxed format that is easily query-able. It was built to provide a longer-lasting historical window for public data sources that provide only real-time data snapshots, especially for sensor data from public government services like weather, traffic, and geological data.

River View fetches data from user-defined Rivers at regular intervals, populating a local Redis database. This data is provided in a windowed format, so that data older than a certain configured age is lost. But the window should be large enough to provide enough historical data to potentially train machine intelligence models on the data patterns within it.

Video Introduction

Watch this short video for a quick introduction to River View.

Code Docs

See online documentation at http://nupic-community.github.io/river-view/.

Dependencies

Redis

You must have a Redis instance available. The URL to the instance should be set in an environment variable called REDIS_URL, something like:

export REDIS_URL=redis://127.0.0.1:6379

You may use authentication in the Redis URL string:

export REDIS_URL=redis://username:password@hostname:port

Rivers and Streams

A River is a pluggable collection of public data Streams gathered from one or more origins and collected in a query-able temporary temporal pool. Rivers are declared within the rivers directory, and consist of:

a namespace, which is assumed based upon the directory name of the data source within the rivers directory
a YAML configuration file, containing:
- one or more external URLs where the data is collected, which are public and accessible without authentication
- the interval at which the data source will be queried
- when the data should expire
a JavaScript parser module that is passed the body of an HTTP call to the aforementioned URL(s), which is expected to parse it and return a temporal object representation of the data.

Each River may produce one or many Streams of data, each collecting like data items over time. Each stream must have a unique ID, but all streams must use the same data schema (fields and meta data are defined at the River level).

For example, a city traffic data source may produce data streams for many traffic paths within the city, each identified with a unique stream ID. A US state water level data source might have unique sources for each water level sensor in the state, each with a unique stream ID.

River Types

All river streams must have a timestamp for each row of data. Other than that, they might have different primary types of data, as described below:

spatial: integer or float values
geospatial: latitude / longitude (floats)
categorical: string values

The data streams will be presented differently, both in JSON and HTML, depending on the type specified in the config.yml file.

Creating a River

Please see Creating a River in our wiki.

Web Services

In addition to collecting and storing data from Rivers, a simple HTTP API for reading the data is also active on startup. It returns HTML, JSON, and (in some cases) CSV data for each River configured at startup.

URLs

URL	Description
`/index.[html\|json]`	Current Rivers active in River View
`/<river-name>/props.[html\|json]`	Detailed information about a river, including the URL to the river's keys
`/<river-name>/keys.[html\|json]`	All unique ids for data within river
`/<river-name>/<id>/data.[html\|json\|csv]`	All data for specified key
`/<river-name>/<id>/meta.[html\|json]`	All metadata for specified key

Running Locally (on OS X)

OS X has some weird built in behaviors regarding the maximum number of open file descriptors. River-view needs the system to handle around 1024 open descriptors to actually start up, so if you run into any sort of file-can't-be-opened errors, check that you have an appropriate number of maximum open file descriptors by running ulimit -n. If this number is less than 1024, you'll need to update it.

Updating the maximum number of open file descriptors

sudo launchctl limit maxfiles 1024 unlimited

This updates the maximum number of open file descriptors your Mac will allow. This number is not persistant across reboots. To make it persistant add limit maxfiles 1024 unlimited to /etc/launchd.conf

ulimit -n 1024

This updates the current shell you're in to be able to make use of all those file descriptors.

htm-community / river-view

readme