leeadkins / elasticsearch-redis-river

A Redis River for Elastic Search.
MIT License
61 stars 16 forks source link

Elastic Search Redis River

Tested up to Elasticsearch 1.0.1

This is a simple River that utilizes the same Bulk API used in Elastic Search REST requests and the RabbitMQ River, but with Redis.

Once you've constructed your bulk indexing command, you can push it into the Redis list specified when the river was created.

I chose the Bulk API because I needed the flexibilty of dumping lots of different things into the same place while indexing. If you're looking for the easiest way to just index some JSON, you might be interested in the newer elasticsearch-river-redis. I'm not affiliated with that project, but it's definitely more straightforward if you're just trying to get some JSON into a single index.

INSTALLATION

This isn't in Maven. I rely on Github Releases, so you'll need to do this:

bin/plugin -install redis-river -url https://github.com/leeadkins/elasticsearch-redis-river/releases/download/v0.0.5/elasticsearch-redis-river-0.0.5.zip

Don't forget to restart the node before trying to use it.

USAGE

    curl -XPUT 'localhost:9200/_river/my_redis_river/_meta' -d '{
        "type" : "redis",
        "redis" : {
            "host"     : "localhost", 
            "port"     : 6379,
            "key"      : "redis_key",
            "mode"     : "list",
            "password" : "yourpassword",
            "database" : 0
        },
        "index" : {
            "bulk_size" : 100,
            "bulk_timeout" : 5
        }
    }'

Create your river using the standard river PUT request. Your options are:

Using a Redis List

The first time you send something to be indexed to Redis (either list or pubsub), the river start preparing a bulk request. This request will be executed once it reaches the bulk size or after the bulk_timeout passes, whichever is first.

Setting the bulk_timeout to 0 when using a list doesn't mean that it won't wait. It means that it will not timeout a bulk request. In other words, it will always wait for enough messages to fill the bulk_size. To acheive the effect of not waiting for anything while using a list, you could set both the bulk timeout and the bulk size to 0, which would tell the river to send every single index request through automatically. This probably isn't the best for performance purposes.

Example

As mentioned above, this particular river uses the Bulk API. If you're in a redis-cli console, and you have a river setup like the above example, this redis command should put something in your elasticsearch node.

LPUSH redis_key "{\"index\":{\"_index\":\"analytics\",\"_type\":\"analytic\",\"_id\":1}}\n{\"id\":1,\"age\":25,\"name\":\"My Name\"}\n"