mapreducelab / bigdata-helm-charts

Depreciated in favor of datalake-kubernetes. Collection of Kubernetes Big Data ecosystem products helm charts
MIT License
8 stars 3 forks source link

Bug: Spark on Kubernetes (Standalone mode) => 2.3.0 trying to connect to master using its master's pod name and throws "java.net.UnknownHostException" #14

Closed antonputra closed 6 years ago

antonputra commented 6 years ago

Spark 2.2.2 and lower works fine on k8s(Standalone mode), but 2.3.0 and up throws "java.net.UnknownHostException"

18/07/15 01:05:51 WARN TransportClientFactory: DNS resolution for spark-master-controller-8xlkx:42615 took 8403 ms
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
    ... 4 more
Caused by: java.io.IOException: Failed to connect to spark-master-controller-8xlkx:42615
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: spark-master-controller-8xlkx
    at java.net.InetAddress.getAllByName0(InetAddress.java:1280)
    at java.net.InetAddress.getAllByName(InetAddress.java:1192)
    at java.net.InetAddress.getAllByName(InetAddress.java:1126)
    at java.net.InetAddress.getByName(InetAddress.java:1076)
    at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
    at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
    at java.security.AccessController.doPrivileged(Native Method)
    at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
    at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
    at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
    at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
    at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
    at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
    at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
    at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
    at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
    at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
    at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
    at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
antonputra commented 6 years ago
Spark Executor Command: "/usr/lib/jvm/java-8-openjdk-amd64/bin/java" "-cp" "/opt/spark/lib/gcs-connector-latest-hadoop2.jar:/opt/spark/conf/:/opt/spark/jars/*" "-Xmx512M" "-Dspark.driver.port=46063" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@spark-master-controller-8xlkx:46063" "--executor-id" "12" "--hostname" "192.168.4.36" "--cores" "1" "--app-id" "app-20180715012255-0001" "--worker-url" "spark://Worker@192.168.4.36:37329"
antonputra commented 6 years ago

@akuksin In order to make Kubernetes and Spark 2.3.0 and up work in Standalone mode you have to do 2 thing:

  1. Create Headless service. By default, when you create Service, you will get ClusterIP (similar to load-balancer) that means that all traffic has to go through your load balancer(must specify/hardcode ports to map). Since master dynamically creates dedicated RPC port for each worker you won't be able to route workers to master. When you create headless service, for example
    spec:
    ports:
    clusterIP: None

    Service will act as DNS Round Robin and route requests to pod rather than service(load-balancer).

  2. You must provide hostname directly in ReplicationController or Deployment object, for example
    spec:
      hostname: spark-master
      containers:
        - name: spark-master

    otherwise spark master will use pod's hostname as a driver-url and workers won't be able to resolve it.