Improve ElasticSearch backend network performance

kpn / py-timeexecution

Time Execution: record application metrics

https://pypi.org/project/timeexecution/

Apache License 2.0

12 stars 19 forks source link

Improve ElasticSearch backend network performance #19

Closed sergray closed 8 years ago

sergray commented 8 years ago

Issue #17 revealed that ElasticSearch backend performance greatly depends on the network latency and service availability.

ElasticSearch client used in time_execution.backends.elasticsearch.ElasticsearchBackend makes 4 retries to send the data if ElasticSearch service is unavailable. It must be configured to make only one attempt by default.

Additionally, a sane default network timeout should be applied to ElasticSearch client.

snelis commented 8 years ago

Would it make sense to queue up all metrics and send on x sec interval ? Would it benefit or harm performance in this case ?

On 3 Aug 2016, at 14:35, Sergey Panfilov notifications@github.com wrote:

Issue #17 revealed that ElasticSearch backend performance greatly depends on the network latency and service availability.

ElasticSearch client used in time_execution.backends.elasticsearch.ElasticsearchBackend makes 4 retries to send the data if ElasticSearch service is unavailable. It must be configured to make only one attempt by default.

Additionally, a sane default network timeout should be applied to ElasticSearch client.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

puntonim commented 8 years ago

Do you guys think it is possible to wrap the TCP connection with an UDP, using something like a relay proxy? So the ElasticSearch client would connect to a relay proxy on localhost via TCP and the relay proxy would wrap that connection with an UDP connection to ElasticSearch server. Maybe even tho possible, this would be over complicated, but f.i. this tool claims to "allows arbitrary TCP services to be carried over Broadcast, Unicast and Multicast UDP".

sergray commented 8 years ago

@snelis queueing itself will not solve the problems associated with sending data over HTTP/TCP unless the sending process is decoupled with request processing.

@nimiq it is possible to implement UDP to TCP proxy, but that would require implementation of a new backend

In this ticket I would like to make existing ElasticSearch backend production ready with a guaranteed performance in the worst use-case.

Proper solution seems to be running a metrics agent in a separate process on the same machine, which forwards metrics to a storage, and use IPC (e.g. http://zeromq.org/) to send metrics from an app, but let's tackle it in a separate issue

ricardosantosalves commented 8 years ago

I believe the main requirements (business or architectural...) here are:

A) write_metrics should not block execution, even when it fails.
B) the performance impact of each write_metrics call should be negligible. So, a developer shouldn't care about that impact, no matter how much metrics he writes in a single transaction.

The usage of UDP gives us both A and B. By using TCP, we loose B, unless we use some fire-and-forget strategy (threading?, queuing?, both?...), to make write_metrics asynchronous.

sergray commented 8 years ago

No changes are need to existing source code. It is possible to pass timeout parameter in seconds and max_retries through ElasticsearchBackend and request to ElasticSearch will timeout after the given value

from time_execution.backends.elasticsearch import ElasticsearchBackend
ElasticsearchBackend(timeout=0.02, max_retries=0)