jaegertracing / spark-dependencies

Spark job for dependency links
http://jaegertracing.io/
Apache License 2.0
124 stars 70 forks source link

could support elasticsearch read time out configuration? #72

Open wuyupengwoaini opened 5 years ago

wuyupengwoaini commented 5 years ago

Recently,I got an error when run this spark-dependencies job in dokcer . 19/08/02 05:50:26 INFO ElasticsearchDependenciesJob: Running Dependencies job for 2019-08-01T00:00Z, reading from jaeger-span-2019-08-01 index, result storing to jaeger-dependencies-2019-08-01 [Stage 0:> (1 + 8) / 3040]19/08/02 05:51:31 ERROR NetworkClient: Node [[ip1:port] failed (Read timed out); selected next node [[ip1:port] 19/08/02 05:51:31 ERROR NetworkClient: Node [ip1:port] failed (Read timed out); selected next node [ip2:port] 19/08/02 05:51:31 ERROR NetworkClient: Node [ip2:port] failed (Read timed out); selected next node [ip3:port] So,I suppose could support timeout config. ` The following is the specific configuration

es.http.timeout (default 1m) Timeout for HTTP/REST connections to Elasticsearch.`

https://www.elastic.co/guide/en/elasticsearch/hadoop/6.6/configuration.html

pavolloffay commented 5 years ago

+1 would you like to submit a PR for it?

wuyupengwoaini commented 5 years ago

Not only the es.http.timeout configuration and also other the cofiguration about performance such as es.scroll.size (default 50).So,I suppose that we should provide the ability for users to configure performance parameters,but I can't find a good way to do this @pavolloffay Do you have a good idea?

pavolloffay commented 5 years ago

The parameters are configurable via ENV vars, just follow the approach what we use for other parameters.