kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.74k stars 1.36k forks source link

UnknownHostException: kubernetes.default.svc: Name or service not known #1509

Open ganliang13 opened 2 years ago

ganliang13 commented 2 years ago

God, what am I supposed to do, thank you

kubectl apply -f examples/spark-pi.yaml 
22/04/11 09:15:45 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2967)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:557)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2678)
    at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:942)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:936)
    at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
    at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.base/java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [spark-pi-driver]  in namespace: [default]  failed.
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
    at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84)
    at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:75)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:74)
    at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:123)
    at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2961)
    ... 19 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Name or service not known
    at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
    at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source)
    at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source)
    at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source)
    at java.base/java.net.InetAddress.getAllByName0(Unknown Source)
    at java.base/java.net.InetAddress.getAllByName(Unknown Source)
    at java.base/java.net.InetAddress.getAllByName(Unknown Source)
    at okhttp3.Dns$1.lookup(Dns.java:40)
    at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:185)
    at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:149)
    at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84)
    at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:215)
    at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
    at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
    at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
    at okhttp3.RealCall.execute(RealCall.java:93)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416)
    at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:933)
    at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:220)
    ... 26 more
22/04/11 09:15:45 INFO SparkUI: Stopped Spark web UI at http://spark-pi-5039c38017e7fe71-driver-svc.default.svc:4040
22/04/11 09:15:45 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/04/11 09:15:45 INFO MemoryStore: MemoryStore cleared
22/04/11 09:15:45 INFO BlockManager: BlockManager stopped
22/04/11 09:15:45 INFO BlockManagerMaster: BlockManagerMaster stopped
22/04/11 09:15:45 WARN MetricsSystem: Stopping a MetricsSystem that is not running
22/04/11 09:15:45 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
cwyl02 commented 2 years ago

At the first glance I think there might be something going on in your k8s cluster. is your core-dns pod healthy?

ganliang13 commented 2 years ago

我是通过rbac的方式解决的,谢谢大神。

metalbreeze commented 1 year ago

我是通过rbac的方式解决的,谢谢大神。

能给出具体步骤嘛?

matif607 commented 5 months ago

我是通过rbac的方式解决的,谢谢大神。

Can you please share the resolution steps?

ramakrishna-hande commented 1 month ago

Can you please give resolution steps ? @metalbreeze ? @matif607 ? I am also facing similar issue