richardwilly98 / elasticsearch-river-mongodb

MongoDB River Plugin for ElasticSearch
1.12k stars 215 forks source link

mongo river not indexing data #445

Open Jay13 opened 9 years ago

Jay13 commented 9 years ago

The current river version is failing to do initial import.We traced the issue to sort option which has been added to the initial import query in CollectionSlurper. The sort option is triggering collection scan which is failing in our case as the collection is having roughly 12 million docs (despite having proper index on filter criteria).We have fixed the same by removing the sort option from initial import query(Optimal solution would have been to add a limit option to the query).

Refer https://jira.mongodb.org/browse/SERVER-12923

es-river-mongodb : 2.0.4 es : 1.4 mongo : 2.6

tmatei commented 9 years ago

It's well known that 2.0.4 doesn't work properly. Have you tried 2.0.5?

benmccann commented 9 years ago

@Jay13 thanks for looking into the problem

i'm curious in what was is it failing? is it taking longer than you'd expect because the query is still running? is the query timing out? is the query crashing your mongod instance?

removing the sort option is unsafe because getFilterForInitialImport expects you to be iterating over the ids in order. if you remove sorting, then you should also remove getFilterForInitialImport and the retry logic

mygitrepo commented 9 years ago

Hello,

Wondering why mongodb river plugin installation is failing ? I'm able to get mapper attachment plugin from elasticsearch.org. Appreciate your help.

Cheers.

root@server:/usr/share/elasticsearch# cd /usr/share/elasticsearch && bin/plugin --verbose --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5 -> Installing com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5... Trying http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip... Failed: IOException[Can't get http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip to /usr/share/elasticsearch/plugins/river-mongodb.zip]; nested: FileNotFoundException[http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip]; nested: FileNotFoundException[http://download.elasticsearch.org/com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/elasticsearch-river-mongodb-2.0.5.zip]; Trying http://search.maven.org/remotecontent?filepath=com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zip... Failed: SocketTimeoutException[connect timed out] Trying https://oss.sonatype.org/service/local/repositories/releases/content/com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zip... Failed: SocketTimeoutException[connect timed out]

Jay13 commented 9 years ago

@tmatei Have not yet tried 2.05.Will look into same.Thank you

Jay13 commented 9 years ago

@benmccann The problem is, the mongo query planner is selecting the "_id" index -- which results into a full collection scan rather than using the indexed field, the problem query does not seem to return as you can see below from the mongo log excerpt , $orderby: { _id: 1 } } planSummary: IXSCAN { _id: 1 } cursorid:1191792168547 ntoreturn:0 ntoskip:0 nscanned:8845011 nscannedObjects:8845010 keyUpdates:0 numYields:1370464 locks(micros) r:8493905138 nreturned:15 reslen:1060755 45350427ms

richardwilly98 commented 9 years ago

@mygitrepo it looks like you cannot access this link: https://oss.sonatype.org/service/local/repositories/releases/content/com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zip

This link is available - please check your environment.

mygitrepo commented 9 years ago

Thanks ! Using proxyHost and proxyPort did the trick. cd /usr/share/elasticsearch && bin/plugin -DproxyHost=64.102.255.40 -DproxyPort=80 --verbose --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5-> Installing com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.5 From: Richard notifications@github.com To: richardwilly98/elasticsearch-river-mongodb elasticsearch-river-mongodb@noreply.github.com Cc: mygitrepo nikulb@yahoo.com Sent: Friday, January 2, 2015 4:02 AM Subject: Re: [elasticsearch-river-mongodb] mongo river not indexing data (#445)

@mygitrepo it looks like you cannot access this link: https://oss.sonatype.org/service/local/repositories/releases/content/com/github/richardwilly98/elasticsearch/elasticsearch-river-mongodb/2.0.5/elasticsearch-river-mongodb-2.0.5.zipThis link is available - please check your environment.— Reply to this email directly or view it on GitHub.