openzipkin / zipkin-dependencies

Spark job that aggregates zipkin spans for use in the UI
Apache License 2.0
176 stars 81 forks source link

ERROR NetworkClient: Node [172.18.0.16:9200] failed (Operation timed out (Connection timed out)); #103

Closed halfofcity closed 5 years ago

halfofcity commented 6 years ago
ZIPKIN_LOG_LEVEL=DEBUG STORAGE_TYPE=elasticsearch ES_HOSTS=http://172.30.220.11:16920 ES_INDEX=qg_zipkin java -jar zipkin-dependencies-1.11.4.jar
18/05/11 14:39:29 INFO ElasticsearchDependenciesJob: Processing spans from qg_zipkin-2018-05-11/span
18/05/11 14:39:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/05/11 14:39:30 WARN Utils: Your hostname, bogon resolves to a loopback address: 127.0.0.1; using 192.168.28.92 instead (on interface en0)
18/05/11 14:39:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/05/11 14:39:31 WARN Java7Support: Unable to load JDK7 types (annotations, java.nio.file.Path): no Java7 support added
18/05/11 14:44:34 ERROR NetworkClient: Node [172.18.0.16:9200] failed (Operation timed out (Connection timed out)); no other nodes left - aborting...
Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[172.18.0.16:9200]]
    at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:380)
    at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:388)
    at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:484)
    at org.elasticsearch.hadoop.rest.RestClient.indexExists(RestClient.java:479)
    at org.elasticsearch.hadoop.rest.RestRepository.indexExists(RestRepository.java:324)
    at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:228)
    at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:73)
    at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:72)
    at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:44)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.Partitioner$$anonfun$defaultPartitioner$2.apply(Partitioner.scala:66)
    at org.apache.spark.Partitioner$$anonfun$defaultPartitioner$2.apply(Partitioner.scala:66)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.immutable.List.map(List.scala:285)
    at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:66)
    at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:688)
    at org.apache.spark.rdd.RDD$$anonfun$groupBy$1.apply(RDD.scala:688)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.RDD.groupBy(RDD.scala:687)
    at org.apache.spark.api.java.JavaRDDLike$class.groupBy(JavaRDDLike.scala:243)
    at org.apache.spark.api.java.AbstractJavaRDDLike.groupBy(JavaRDDLike.scala:45)
    at zipkin.dependencies.elasticsearch.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:207)
    at zipkin.dependencies.elasticsearch.ElasticsearchDependenciesJob.run(ElasticsearchDependenciesJob.java:166)
    at zipkin.dependencies.ZipkinDependenciesJob.main(ZipkinDependenciesJob.java:72)

this use 172.18.0.16:9200? but I don't have this.

MCeDo commented 5 years ago

Me too. How to resolve it?

zackzhangzz commented 5 years ago

me too. 19/04/04 17:14:35 ERROR NetworkClient: Node [10.42.32.93:9200] failed (Connection timed out (Connection timed out)); no other nodes left - aborting... Exception in thread "main" org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[10.42.142.87:9200, 10.42.51.199:9200, 10.42.32.93:9200]] at org.elasticsearch.hadoop.rest.NetworkClient.execute(NetworkClient.java:149)

codefromthecrypt commented 5 years ago

sometimes this is related to ES_NODES_WAN_ONLY setting in our readme and in elasticsearch-hadoop docs. can you try?

zackzhangzz commented 5 years ago

ES_NODES_WAN_ONLY=true works for me,thank you.

codefromthecrypt commented 5 years ago

@devinsba @shakuzen @zeagord seems everyone asks this and runs into the problem. maybe we switch default from false to true?

huhu-sky commented 5 years ago

hello, @adriancole ES_NODES_WAN_ONLY=true it works. The origin error message is:

ShuffleMapStage 0 (groupBy at ElasticsearchDependenciesJob.java:179) failed in 60.806 s dueto Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[172.31.117.44:9200]]

172.31.117.44 is the es pod ip , the running job container can connect to the pod ip. (k8s environment). so it is still strange for me. what does env variable 'ES_NODES_WAN_ONLY‘ exactly work for?

codefromthecrypt commented 5 years ago

es.nodes.wan.only is an elasticsearch feature here are their docs on it.

https://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html

if you need more info I would suggest contacting their support channels as we are not in charge of elasticsearch implementation details.

On Thu, May 30, 2019, 12:22 PM huhu notifications@github.com wrote:

hello, @adriancole https://github.com/adriancole ES_NODES_WAN_ONLY=true it works. The origin error message is:

ShuffleMapStage 0 (groupBy at ElasticsearchDependenciesJob.java:179) failed in 60.806 s dueto Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[172.31.117.44:9200]]

172.31.117.44 is the es pod ip , the running job container can connect to the pod ip. (k8s environment). so it is still strange for me. what does env variable 'ES_NODES_WAN_ONLY‘ work for?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-zipkin-dependencies/issues/103?email_source=notifications&email_token=AAAPVVYKGR3WV36BBSHNFYDPX5JB7A5CNFSM4E7MH5GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWRKACY#issuecomment-497197067, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV6JHPQS3JNHLYA22MTPX5JB7ANCNFSM4E7MH5GA .

huhu-sky commented 5 years ago

OK, thanks。 @adriancole