taraslayshchuk / es2csv

Export from an Elasticsearch into a CSV file
Apache License 2.0
510 stars 191 forks source link

Occasionally painfully slow performance over a network #37

Closed ghost closed 6 years ago

ghost commented 6 years ago

I generally run es2csv from my local machine against a remote EC2 Linux instance over a VPN connection. Sometimes I get 600-800 docs/s and sometimes I get 10-20 docs/s. Everything that I control is the same. CPU utilization on the remote machine is very low. The same query run an hour later may run at 10x the speed.

Here is the query running local to remote:

es2csv -u http://x.x.x.x:9200 -o down.csv -i kaallc -q @'downsample.query' -r Found 476521 results Run query [ ] [5601/476521] [ 1%] [0:05:46] [ETA: 8:05:11] [ 16.2 docs/s]

Here is the query running directly on the remote Linux instance:

es2csv -u http://x.x.x.x.:9200 -o down.csv -i kaallc -q @'downsample.query' -r Found 476521 results Run query [# ] [169001/476521] [ 35%] [0:02:48] [ETA: 0:05:07] [1000.9 docs/s]

Is there any possible network tuning or other actions I can take to affect the performance?

Thank you.

taraslayshchuk commented 6 years ago

Hello, @dckovar

You could change scroll_size parameter to 1000 or 10000:

 -s, --scroll_size INTEGER                Scroll size for each batch of results. Default is 100.

And as for the question about same query behavior don't forget about query cache, so same query should always to be performed faster.