oliver006 / elasticsearch-test-data

Generate and upload test data to Elasticsearch for performance and load testing
MIT License
257 stars 124 forks source link

Request Enhancements for better performance and stress testing. #3

Open ajaybhatnagar opened 8 years ago

ajaybhatnagar commented 8 years ago

Is it possible to reduce CPU usage by using predefined strings in memory as field value instead of generating random strings each time? Reason for this request is I observed 100% CPU installation when running this tool. Each random string generation seems to consume CPU cycle. Further , as this is single threaded script, it does not make use of available CPU in multicore nodes. Thus I am not able to fully stress the Elasticsearch nodes. When single thread CPU utilization reaches 100%, latency of indexing increases though CPU, Load, Memory or IOPs are not a bottleneck on ES node. Can the script use multi-threading option?

In addition to just insert, option for updating together with search queries could make it even better to simulate realistic cases.

oliver006 commented 8 years ago

Great suggestions, I'll look into making some changes, thank you!

biggers commented 6 years ago

Is this project still active at all? Thumbs up for @ajaybhatnagar 's request... However, Tornado based on Python (of course), has the GIL and therefore probably cannot take advantage of N cores on your host - as I understand it. Would have to "go multiprocess" to do so.

oliver006 commented 6 years ago

The project is still active although I currently don't actively use it myself as I don't have a need for it right now. If I would write this today I'd write it in Go as it takes better advantage for multiple cores on your machine (see your comment) and is more easy to deploy (just a single binary).

At any rate, if you have features you'd like to see added (the the pre-calce'd strings which IMO is a great idea) then I'm happy to review and merge PRs but I don't have the time right now to implement anything new myself.

oliver006 commented 6 years ago

One more thing re: @biggers and the multi-core issue: you can just run multiple processes of the python task to max out your CPU cores, the network and your ES clusters ingestion capacity. I know it's just a workaround but might solve your problems for now.