jaegertracing / spark-dependencies

Spark job for dependency links
http://jaegertracing.io/
Apache License 2.0
122 stars 69 forks source link

spark-dependencies error:The number of slices [1217] is too large. #100

Open jiangxinlingdu opened 3 years ago

jiangxinlingdu commented 3 years ago

spark-dependencies running,but yesterday spark-dependencies error,log info:

INFO ElasticsearchDependenciesJob: Running Dependencies job for 2020-10-29T00:00Z, reading from hour-jaeger-span-2020-10-29 index, result storing to hour-jaeger-dependencies-2020-10-29
[Stage 0:> ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: The number of slices [1217] is too large. It must be less than [1024]. This limit can be set by changing the [index.max_slices_per_scroll] index level setting.

i find source is :https://github.com/jaegertracing/spark-dependencies/blob/master/jaeger-spark-dependencies-elasticsearch/src/main/java/io/jaegertracing/spark/dependencies/elastic/ElasticsearchDependenciesJob.java

run method i find 230 lines:

log.info("Running Dependencies job for {}, reading from {} index, result storing to {}", day, spanIndex, depIndex);

i read https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#scroll-search-results

i update the index.max_slices_per_scroll index setting, i use 2048 but log info:

ERROR NetworkClient: Node [xxxx:9200] failed (Read timed out); no other nodes left - aborting...
ERROR Executor: Exception in task 5.0 in stage 0.0 (TID 5)
org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[xxxxx:9200,……,……]]

what should I do ? how to run job ????