Open tom-pryor opened 10 years ago
Been playing around and the issue seems related to the persistent curl connection. I've logged the responses and using a persistent curl connection seems to occasionally either return stale (i.e the result of a previous request) or blank responses.
Did you try to turn off persistent? https://github.com/ruflin/Elastica/blob/master/lib/Elastica/Client.php#L38
It will make it slower, but perhaps it solves the problem.
An other good option is to us Bulk queries if you have a lot of requests.
Yeah, turning off persistent fixes the problem. Although I'm not sure why the issue is occurring with persistent enabled, seems like strange behaviour.
I'd use bulk queries but the problem is it is indexing data received over an API (i.e have no control when data comes in) and it needs to be available to search pretty much instantly.
What php and curl version do you use?
@Tomdarkness Can you check if this change resolves your problem? https://github.com/ruflin/Elastica/pull/567/files
During medium volumes of individual documents being indexed (5 or so a second) we're noticing some data loss. A new document is indexed then shortly after (a few seconds, greater than refresh_interval, i.e the document is indexed) we attempt to update a field in the document again using Elastica.
However, a "Undefined index _version" at:
https://github.com/ruflin/Elastica/blob/master/lib/Elastica/Type.php#L248
Occasionally occurs when updating the document. When this occurs the whole document is replaced solely with the updated field(s) and nothing else, causing data loss.
The error logging tool we are using records the context and the value of $result is very strange. It's an array of 4 elements:
Which seems to indicate a query was performed rather than fetching the document by id.
Running Elasticsearch 1.0.1 and Elastica 1.0.0.
I'll try and see if I can get some more information.