usgin / usgin-cache

Cache a whole system in your CouchDB
0 stars 1 forks source link

usgin-cache

Build Status

In a USGIN-style data systems, data sets are conveyed as OGC Web-Feature Services (WFS), and are served by a distributed network of data providers. Each of these services is cataloged in one (or several) metadata-aggregating services that conform to the OGC's Catalog Service for the Web (CSW).

This means that a user can find data sets by

  1. performing a thematic or spatial search against a CSW service,
  2. analyzing the results to determine which might be fit for the purpose at hand, and
  3. following a pointer to the WFS dataset of interest.

This workflow requires the user to do a lot of work before they even get to see the data. After step #3, the user may realize that the data they wanted isn't in the particular WFS that they chose, and they'll have to repeat the process.

This software brings the data itself closer to the search experience

The end goal is to provide the user with a dynamic map that displays actual data, or at least enough of the data to give the user a better, more immediate idea of exactly what is available. This is an important target because its been shown that the first search priority for users searching for geoscientific data is location. We want a map that allows you to zoom to a location and see whats available there before you begin any thematic or keyword filtering to narrow down the results.

In order to achieve this, this software works as follows:

  1. A request is made to an aggregating CSW service in order to find WFS services that meet some particular criteria. The module for making these CSW requests allows for some configuration in order to specify what kinds of WFS services are of interest.
  2. The resulting set of WFS services are queried in order to return ALL of the data that is available from that service. The entire WFS response document is cached in CouchDB. The resulting document is transformed into GeoJSON using ogr2ogr (see features/toGeojson.js). Each feature from the WFS response is then stored in CouchDB as a GeoJSON object. These cached objects can be refreshed whenever required.
  3. Mapping function are written which indicate how a single GeoJSON feature should be indexed. These function are passed one GeoJSON feature, and return a simple JSON object representing the key-value pairs that will be included in a ElasticSearch.
  4. A cached document is read and the features it contains are each passed through the mapping function before being added to the ElasticSearch index.

This ElasticSearch index will provide an endpoint that can be searched by a thin, front-end client, such as one envisioned above.

Installation

Pre-requisite Installations:

Then:

git clone https://github.com/usgin/usgin-cache.git
cd usgin-cache
npm config set msvs_version 2012 --global 
npm install

Connect to MongoDB

There is a bash script included that will download, configure, and run MongoDB for you. To use it, just type

chmod 755 run-mongo.sh
./run-mongo.sh

Elastic Search Configure

  1. Unzip binaries.
  2. Install JDK and set environment variables Name: JAVA_HOME, PATH=’JDK installation directory in program files’
  3. Run ElasticSearch command from ElasticSearchh bin directory.
  4. Install plugin for ElasticSearch from here
  5. Install Marvel management and monitoring tool if possible

Writing Mapping Functions

Mapping functions define how certain kinds of features are indexed by ElasticSearch. These function read GeoJSON data from CouchDB and convert it to an object that can be easily ingested by ElasticSearch. The properties of the converted object become the fields on which you can search.

For the NGDS, the idea is to write mapping functions only for USGIN Content Models of Interest. As an example, see the documented code describing the mapping of the aasg:ThermalSpring model.

Code Docs

Build documentation from code comments with [groc]. These docs live on the gh-pages branch and are accessible here.

To rebuild them, follow these instructions:

git checkout gh-pages
git merge master
npm install
npm install -g groc
groc

Running Tests:

  1. Make sure that ElasticSearch and MongoDB is running
  2. Make a copy of tests/test-config-example.json with the name tests/test-config.json and edit it to match the connection details for your test database.
  3. Run the tests.
npm test