Closed amitpawar closed 8 years ago
Hello
I can not truely say what really is going with out any information from you. You should provide strack trace from running script.
I can only guess what if script just stop to write to file there is problem with connection. And it should retry after 2 minutes timeout. Did you try to wait?
Maybe you can provide query and arguments which you used for running script, version of script and example of information which you have in your elasticsearch?
Hello
Yes, waited for around 1-2 hrs before manually stopping the script.
Here is the call:
python es2csv.py -u server_address -i foxstream_offers_production_active -f id product_id store_id retailer_id price availability_status product_gtin indexed_at activation_status quantity -q 'retailer_id:50a7fe9fe7ac4a8c8df86fb0189caa66' -o feeds\\offers_test.csv --debug
The information on the ES is about the offers for our website marketplace. The index has around 120 Million records, which we break down into 3 fetches based on retailers and run it as Jenkins scheduled jobs.
The stack trace will just hang at around 95% with no changes in count and ETA for 1-2 hrs. Run query [] [37762125/39583209] [ 95%] [2:45:48] [ETA: 0:07:59] [ 3.80 kdocs/s]
Please let me know your comments.
Thanks Amit
Hello, Amit.
I tried to reproduce your issue and found only one version. Did you try to open in browser or by using curl link
http://localhost:9200/_search/scroll?scroll_id=%scroll_id_from_debug_output%
if so - you break page order and elastic search keeps return scroll_id == c2NhbjswOzE7dG90YWxfaGl0czoxOTIwMjQwOw==, this is four times shorter then regular and has zero count elements in hits array. There are not any checks to terminate multiple requests and this runs while process forever with no chance to stop. You could read more about scroll search.
Hello
You are right. Added an extra check with while loop: `
hits_check = res['hits']['total']
while total_lines != self.num_results and hits_check > 0:
res = self.es_conn.scroll(scroll=self.scroll_time, scroll_id=res['_scroll_id'])
hits_check = len(res['hits']['hits'])`
The script has been running without any hanging (infinite while loop) for a week now.
Thanks once again.
@taraslayshchuk Any chance you'll commit this check to master
? I'm running into the same issue.
This is already done, but not for pip release. In any way, looks like we are still did not found the main cause of the problem (Please look into #10 issue)
Hello
I am using this library for fetching all the records from the ES and it works like a charm. Best thing available for ES to CSV.
Although, sometimes the script hangs during the search_query() phase. It stops writing to the temporary file and will keep on running.
Will this be a script issue or an issue on the ES side.
Any help or pin-pointing to certain direction is appreciated.
Thanks Amit