Development - Githubissues

electrical commented 11 years ago

Hi,

Perhaps i'm mistaken but development around this has been pretty dead for the past 11 months? Is this still a live project? If so are you planning to test / develop it against a more recent version?

Cheers.

leeadkins commented 11 years ago

Hello,

The way elasticsearch chooses to lookup plugins has definitely changed a bit since I last touched the README. I've updated that to reflect the best way to install this on the latest elasticsearch. Apologies for not checking it out sooner. I've verified that once installed it still works fine under 0.20.5.

Back to the original question. I'm not actively developing this project at this time. PubSub was never something I ended up really needing; the current Redis list-based functionality handles everything we've needed it to since we started using it. So, since there hasn't really been I've needed to do on it, I haven't really touched it.

At least one person needed to access authenticated Redis instances, so there is a fork (https://github.com/scalp42/elasticsearch-redis-river) with a branch that upgrades the bundled Jedis library and supports that. It's not something I've had a chance to check out though, but it's there if you need it too.

Thanks for checking it out! Lee

electrical commented 11 years ago

Hi Lee,

Thank you very much for your quick reply and thanks for testing it under 0.20.5 :+1:

My current setup is that i have a small LS agent on every ES node to pull data from redis. I'm considering using this river to pull data out of Redis but got a few questions;

Do you have any numbers/graphs of the performance? ( especially against LS reading out of Redis and pusing to ES ) I will be having different keys which contain data for different indexes, is that possible? Can i supply an index name with a date pattern like logstash does ? ( logstash-%d-%m-%y => logstash-23-02-2013 )

Thank you for your time.

leeadkins commented 11 years ago

I don't have any experience with that particular setup scenario, but one thing to note about this project is that it is designed to handle a stream of elasticsearch index/delete commands that conform to the ES Bulk API. In our use, rather than use the REST API or other alternatives, we wanted to be able to throw index commands (and associated data) into some sort of background queue. So, this project was born.

Thus, all data dropped into the Redis key that the river listening to needs to be formatted by in that Bulk API style. This means you wouldn't be able to simple point the river at a Redis key with arbitrary data in it.

But, back to the questions.

Performance I don't have any graphs, but here are some rudimentary performance numbers with the following conditions: Single ES Node on an Core i5 desktop machine (so, once instance of the river) Redis key preloaded with 2 million random pieces of data Redis and ES are running on the same machine.

On average, this particular setup can churn through the 2,000,000 entries in between 210-220 seconds. So, throughput is averaging between 9100-9500 items per second. Looking at the raw numbers while indexing, it can dip as low as 7400/s, which I assume is happening when ES is busy doing a commit, and frequently exceeds 10,000/s. So, performance is pretty good, really just because this project is a thin wrapper on ES's built-in BulkRequest builders, so all it has to do is dump the data off and let ES do its thing.

Granted, these are only on my particular machine with my setup, so YMMV.

Multiple Keys/Indexes Each river only supports a single Redis key. In the Bulk API, you can specify which index is should go to, so many people choose to use that single key and route it that way. However, if you need to get stuff into ES from multiple keys, you can create multiple instances of the river with different key names. That works fine.

Date Pattern index names There currently isn't support for this functionality in this river.

In your case with how you plan on using it, it may be better to use the logstash's own ES River, which allows many of the features you'll need for this particular use case. Though, I don't have any experience with that thing, so there may be other reasons one wouldn't want to use it.

Hope this helps!

lgxz commented 11 years ago

Hi.

The current ElasticSearch release is 0.9.0.

dazoot commented 11 years ago

I am having a weird issue meaning that no data is indexed. I see in the logs debug entries like: Popped from queue: [abonati_index.... But then the acionCount remains 0. Deferring bulk execution: actionCount=0 bulkSize=100

I am using the latest ES: 0.90.5. I am pushing for each indexed document 2 JSON entries in redis list: First: {"index": {"_index": NAME, "_type": TYPE, "_id": ID} Second: { TYPE: { DOCUMENT } }

No documents get changed.

leeadkins commented 11 years ago

Are you pushing them as two separate items in the list?

For the river to get them, you'll need to join those two lines together with a new line characters and push them in a single LPUSH to your redis list.

So, for your instance, an appropriate redis command that should get something in there would be:

LPUSH abonati_index "{\"index\":{\"_index\":\"NAME\",\"_type\":\"TYPE\",\"_id\":1}}\n{\"TYPE\":\"DOCUMENT\"}\n"

This is due to the particular Bulk API that Elasticsearch supported when this project was created.

dazoot commented 11 years ago

Thank you. It works perfect now :)

rkwanGH commented 10 years ago

Hi, I am a complete newbie for logStash/redis/elasticSearch, and would very much appreciate your enlightenments.

I am looking into a setup :

100+ networked nodes/clients that spit out a total of about 3K logs/sec
A centeral redis+logstash+elasticSearch based log-server to index the logs.

The standard setup is : logs -> [[ logStash-shipper ]] ----> [ redis -> logStash-indexer -> elasticSearch ] Question : Can the 100+ logStash-shippers do all the groking, and send out JSON formatted logs, and thus reduce the computing load of the logStash-indexer in the log-server ?

With the redis-river logs -> [[ logStash-shipper ]] ----> [ redis -> elasticSearch ] With the redis-river, redis can be connected directly to elasticSearch. Question : Can the off-the-shelf logStash-shippers be configured to feed the redis+elasticSearch directly ?

I tried this combination :

redis sees the logs being RPUSH into it.
however, the log is not passed onto the elasticSearch ?!

Thanks in advanced.

leeadkins / elasticsearch-redis-river

Development #5