River not working anymore after ES restart

bboy-space commented 10 years ago

Hi,

We are facing an issue using the rivers. When setting the river, the import of data works very well, but if for any specific reason we have to restart one of the 3 ES servers (3 servers in cluster), then the river doesn't work anymore, and the sync mongo-ES is stopped... The only solution we found to restart the river is to delete everything ( river + index ) and recreate it. But with indicies of millions or billions of data it's quite long to reindex everything !

If we delete the river (and keeping the index not to loose data), and then ,just recreate the river. The status is ok "Created:true", but in the log we have INITIAL_IMPORT_FAILED : "[DEBUG][river.mongodb.util ] setRiverStatus called with interactionsindex - INITIAL_IMPORT_FAILED"

If trying to restart directly from the plugin interface : http://localhost:9200/_plugin/river-mongodb/ "[WARN ][org.elasticsearch.river.mongodb.Slurper] Exception in slurper org.elasticsearch.river.mongodb.Slurper$SlurperException: River out of sync with oplog.rs collection "

That's why we delete everything and restart from the beginning.

As said above, we have 3 ES servers in cluster with a load balancer. It may be a configuration misunderstanding but if you have some clues to solve it that would be highly appreciated.

ES : 1.1.2 Mongo : 2.4.10

Thank you very much. Regards

Elasticsearch interface

talha-asad commented 10 years ago

mongodb-river is not yet compatible with 1.1, its only in the master branch even that is not stable. Are you using the master branch for the compilation of this plugin?

bboy-space commented 10 years ago

Thx for the reply. Yes i'm using the master branch of the plugin. But by 1.1 do you mean 1.1.* versions of ES ? because on the main page of the plugin, i can see that :

River ES Mongo master 1.1.1 2.4.9 -> 2.6.1

So does it mean I have to downgrade ES to 1.1.1 for a more stable use ?

pvin commented 10 years ago

hi.. I use

Elastic search - 0.90.5
mongodb -2.4.10
elasticsearch-river-mongodb-1.7.1-SNAPSHOT.zip

couldn't connect to mongodb, any changes in this combination ? or any other compatible version ?

talha-asad commented 10 years ago

I am currently in the process of testing a new version of this plugin, please hold off till its here. You can than use much recent versions of mongodb and elasticsearch.

bboy-space commented 10 years ago

For your information, i solved some of my issues. First of all, after some look on other github issues, i found out that once the river is broken (for any reason), the only way to start it again is to remove it (so does the index) and restart it, because a river cannot be plugged on a non-empty index ( => that explains the error INITIAL_IMPORT_FAILED ).

Then, we succeed to tune Elasticsearch performance in order not to make it kill the infrastructure and avoid I/O overloads. In the river settings, just play with the throttle_size parameter. (cf [1]) . By default it's 500, but if you have huge data to import, ES will wait a lot, and then use lot of memory and may kill the server... In my case, I put throttle_size:1300, then the queue is much bigger, so ES waits less

Note that before tuning, IOwaits (in yellow) were almost the same as the blue. Below, the graph after tuning : (IO waits in yellow are good) After tuning, IO (in yellow) are very good

[1] https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/55

talha-asad commented 10 years ago

@thomas-massiere Thanks for the info.

pvin commented 10 years ago

@thomas-massiere hi, can you tel me compatible version of mongodb,ES, elasticsearch-river-mongodb ?

ebuildy commented 9 years ago

There are interesting options to play with when you initialize the mongo river:

"index": { 
   "name": ${es.index.name}, 
   "throttle_size": ${es.throttle.size},
   "bulk_size": ${es.bulk.size},
   "type": ${es.type.name}
   "bulk": {
      "actions": ${es.bulk.actions},
      "size": ${es.bulk.size},
      "concurrent_requests": ${es.bulk.concurrent.requests},
      "flush_interval": ${es.bulk.flush.interval}
    }
  }

Where according documentation:

In index bulk processor settings can be changed: ${es.bulk.actions} default value is 1000, ${es.bulk.size} default value is 5mb, ${es.bulk.concurrent.requests} default value is 50, ${es.bulk.flush.interval} default value is 10ms.

Did u try it?

richardwilly98 / elasticsearch-river-mongodb

River not working anymore after ES restart #289