matteodem / meteor-easy-search

Easy-to-use search for Meteor with Blaze Components
MIT License
438 stars 68 forks source link

ElasticSearch: Large data initial sync failing - Resonse Timeout #540

Open niranjans opened 7 years ago

niranjans commented 7 years ago

I have a pretty large data (about 300k records) in a collection using ElasticSearch engine. The initial startup of the app causes the sync up of the entire data with the ES index and this is failing for me. Some records have been stored but most of the times, I get response timedout (even after increasing the timeout time to 120 seconds).

Is there any recommendation for this scenario? Is there a way to slow down the process of initial sync up?

Thanks

matteodem commented 7 years ago

Hi ninranjans, that shouldn't be a problem. How big are your documents generally?

niranjans commented 7 years ago

Thanks for your response @matteodem.

The documents are very basic (generated from Mockeroo - sample below). I am getting random responses / errors - Most of the times, it's not going through.

The Elasticsearch is hosted on Compose.io with enough memory and space (not running localhost for testing purposes).

Using logs: 'trace', here are some of the things that happen:

Regular POSTs being sent. Note that no response is currently showing - only rarely does one of these return status 200 and hence most Timeout later: screenshot-1

Responses with code 0. I am not sure what these mean: screenshot-2

Socket hang up screenshot-3

And finally the Timeouts screenshot-4

I do get some successful writes and 200 responses sometimes, but it's very random and not too often. Any idea what might be happening?

niranjans commented 7 years ago

An update of this issue (might help someone else having this issue):

My ElasticSearch cluster is on Compose.io and looks like the bottleneck is that the sync is happening much faster than the cluster can handle (even after increasing the size of the cluster).

When I add a Meteor._sleepForMs() inside the client.defer function, this slows the entire thing down to a manageable level. So eventually after a long time, the data did get synced.

However, my follow-up question is - Does the entire index sync start from the beginning every time app restarts? Is there a way to manage this? I mean is there someway we can tell it not sync the entire thing (because the sync has already happened once after hours) and only observe changes that happen?

matteodem commented 7 years ago

that makes sense indeed. Right now there's not but that logic could be pretty easily added. As the engine itself defines things like this it's pretty encapsulated:

https://github.com/matteodem/meteor-easy-search/blob/master/packages/easysearch:elasticsearch/lib/engine.js