Public Temporal Streaming Data Service Framework
River View is a Public Temporal Streaming Data Service Framework (yes, that's a mouthful!). It provides a pluggable interface for users to expose temporal data streams in a time-boxed format that is easily query-able. It was built to provide a longer-lasting historical window for public data sources that provide only real-time data snapshots, especially for sensor data from public government services like weather, traffic, and geological data.
River View fetches data from user-defined Rivers at regular intervals, populating a local Redis database. This data is provided in a windowed format, so that data older than a certain configured age is lost. But the window should be large enough to provide enough historical data to potentially train machine intelligence models on the data patterns within it.
Watch this short video for a quick introduction to River View.
See online documentation at http://nupic-community.github.io/river-view/.
You must have a Redis instance available. The URL to the instance should be set in an environment variable called REDIS_URL
, something like:
export REDIS_URL=redis://127.0.0.1:6379
You may use authentication in the Redis URL string:
export REDIS_URL=redis://username:password@hostname:port
A River is a pluggable collection of public data Streams gathered from one or more origins and collected in a query-able temporary temporal pool. Rivers are declared within the rivers
directory, and consist of:
rivers
directoryEach River may produce one or many Streams of data, each collecting like data items over time. Each stream must have a unique ID, but all streams must use the same data schema (fields and meta data are defined at the River level).
For example, a city traffic data source may produce data streams for many traffic paths within the city, each identified with a unique stream ID. A US state water level data source might have unique sources for each water level sensor in the state, each with a unique stream ID.
All river streams must have a timestamp for each row of data. Other than that, they might have different primary types of data, as described below:
The data streams will be presented differently, both in JSON and HTML, depending on the type specified in the config.yml
file.
Please see Creating a River in our wiki.
In addition to collecting and storing data from Rivers, a simple HTTP API for reading the data is also active on startup. It returns HTML, JSON, and (in some cases) CSV data for each River configured at startup.
URL | Description |
---|---|
/index.[html|json] |
Current Rivers active in River View |
/<river-name>/props.[html|json] |
Detailed information about a river, including the URL to the river's keys |
/<river-name>/keys.[html|json] |
All unique ids for data within river |
/<river-name>/<id>/data.[html|json|csv] |
All data for specified key |
/<river-name>/<id>/meta.[html|json] |
All metadata for specified key |
OS X has some weird built in behaviors regarding the maximum number of open file descriptors. River-view
needs the system to handle around 1024 open descriptors to actually start up, so if you run into any sort of
file-can't-be-opened errors, check that you have an appropriate number of maximum open file descriptors by
running ulimit -n
. If this number is less than 1024, you'll need to update it.
sudo launchctl limit maxfiles 1024 unlimited
This updates the maximum number of open file descriptors your Mac will allow. This number is not persistant across reboots. To make it persistant add limit maxfiles 1024 unlimited
to /etc/launchd.conf
ulimit -n 1024
This updates the current shell you're in to be able to make use of all those file descriptors.