Closed ghost closed 6 years ago
Hello, @dckovar
You could change scroll_size parameter to 1000 or 10000:
-s, --scroll_size INTEGER Scroll size for each batch of results. Default is 100.
And as for the question about same query behavior don't forget about query cache, so same query should always to be performed faster.
I generally run es2csv from my local machine against a remote EC2 Linux instance over a VPN connection. Sometimes I get 600-800 docs/s and sometimes I get 10-20 docs/s. Everything that I control is the same. CPU utilization on the remote machine is very low. The same query run an hour later may run at 10x the speed.
Here is the query running local to remote:
es2csv -u http://x.x.x.x:9200 -o down.csv -i kaallc -q @'downsample.query' -r Found 476521 results Run query [ ] [5601/476521] [ 1%] [0:05:46] [ETA: 8:05:11] [ 16.2 docs/s]
Here is the query running directly on the remote Linux instance:
es2csv -u http://x.x.x.x.:9200 -o down.csv -i kaallc -q @'downsample.query' -r Found 476521 results Run query [# ] [169001/476521] [ 35%] [0:02:48] [ETA: 0:05:07] [1000.9 docs/s]
Is there any possible network tuning or other actions I can take to affect the performance?
Thank you.