Open Thelin90 opened 4 years ago
@Thelin90 thank you so much, your 2 nights save my first night :D
@Thelin90 When I config as your, I can submit task from inside master pod but still have "Initial job has not accepted any resources" issue when submit task from other client pods.
@leehuwuj it might be that you are not adding the IP
correctly, make sure to check these steps again:
kubectl get pods -o wide
Go inside the pod as I describe above:
kubectl exec -it <pod> -n <namespace> -- /bin/bash
Take the IP
, and run within the pod as I describe above:
pyspark --conf spark.driver.bindAddress=<MASTER-POD-IP> --conf spark.driver.host=<MASTER-POD-IP>
And then run:
sc.parallelize([1,2,3,4,5,6]).collect()
That should work.
Alternatively, I recommend you fork my repository:
https://github.com/Thelin90/deiteo
And try to run it with my instructions there, you might find if you done something different from me there.
I have automated the whole process with Makefile
so should just be you pressing enter more or less, and if that does not work you can make an issue in the repo
.
@leehuwuj also are you running on Ubuntu or Mac?
@Thelin90 I'm running on Mac. Run Submit command inside master node is OK but I can not run submit task from other pod.
@leehuwuj I think you will need to be more specific with your issue, similar to what I have done here, I have described my problem in very great detail, and how i solved it, if you have a new problem you must try to put down step by step, what you are doing, and what is going wrong. I can't guess based on what you have written unfortunately.
But, I would recommend you to have a look in my own repo and try that, if that works for you, you might find something you are not doing right in your own code.
Dear Mr. I just learned K8s not long ago. In my case, I need to process data in hdfs, so I may need to build a hadoop+spark cluster. But I heard that hadoop(hdfs) is not suitable for running on Kubernets, do you have any insights? … ------------------ 原始邮件 ------------------ 发件人: "testdrivenio/spark-kubernetes" <notifications@github.com>; 发送时间: 2020年11月30日(星期一) 晚上10:22 收件人: "testdrivenio/spark-kubernetes"<spark-kubernetes@noreply.github.com>; 抄送: "tbabm"<1454088456@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [testdrivenio/spark-kubernetes] Solution For Connectivity Problem When Submitting Work From Master Node To Worker(s) (#8) Hello, sir. Excuse me, have you tried Kubernetes+hdfs+spark? @Thelin90 No since the only reason to use HDFS would be if you need to run operations on disk, hence ~10 TB+, which I never had to deal with, so no. In memory has worked just fine so far for my case. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
Please stick to the context of this issue, your question has nothing to do with what is being discussed here.
spark-shell is fine from inside pod but spark submit for PI example is not succesfull causing some websocket closed connection issue
` spark-submit --name sparkpi-1 \
--master k8s://http://spark-master:7077 \ --deploy-mode cluster \ --conf spark.kubernetes.driver.pod.name=$DRIVER_NAME \ --conf spark.kubernetes.container.image=$DOCKER_IMAGE \ --conf spark.kubernetes.container.image.pullPolicy=Never \ --conf spark.driver.host=172.17.0.6 \ --conf spark.kubernetes.kerberos.enabled=false \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=2 \ local:///opt/spark-3.0.0-bin-hadoop2.7/examples/jars/spark-examples_2.12-3.0.0.jar 10000000 21/01/06 15:32:20 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. 21/01/06 15:32:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 21/01/06 15:32:21 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 21/01/06 15:32:21 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image. 21/01/06 15:32:21 WARN WatchConnectionManager: Exec Failure java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:209) at java.net.SocketInputStream.read(SocketInputStream.java:141) at okio.Okio$2.read(Okio.java:140) at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354) at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226) at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215) at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Exception in thread "kubernetes-dispatcher-0" Exception in thread "main" java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@6cd5d189 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@25cdb1c6[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326) at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533) at java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:632) at java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:678) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.scheduleReconnect(WatchConnectionManager.java:305) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.access$800(WatchConnectionManager.java:50) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:218) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:221) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:211) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) io.fabric8.kubernetes.client.KubernetesClientException: Failed to start websocket at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onFailure(WatchConnectionManager.java:209) at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571) at okhttp3.internal.ws.RealWebSocket$2.onFailure(RealWebSocket.java:221) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:211) at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:144) at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:341) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:755) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:739) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:70) at org.apache.spark.deploy.k8s.submit.Client.$anonfun$run$1(KubernetesClientApplication.scala:129) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2538) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:129) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:209) at java.net.SocketInputStream.read(SocketInputStream.java:141) at okio.Okio$2.read(Okio.java:140) at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) at okio.RealBufferedSource.indexOf(RealBufferedSource.java:354) at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:226) at okhttp3.internal.http1.Http1Codec.readHeaderLine(Http1Codec.java:215) at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:88) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257) at okhttp3.RealCall$AsyncCall.execute(RealCall.java:201) ... 4 more 21/01/06 15:32:21 INFO ShutdownHookManager: Shutdown hook called 21/01/06 15:32:21 INFO ShutdownHookManager: Deleting directory /tmp/spark-caec2177-42af-4c26-830a-a2a720e0da43 `
@leehuwuj it might be that you are not adding the
IP
correctly, make sure to check these steps again:kubectl get pods -o wide
Go inside the pod as I describe above:
kubectl exec -it <pod> -n <namespace> -- /bin/bash
Take the
IP
, and run within the pod as I describe above:pyspark --conf spark.driver.bindAddress=<MASTER-POD-IP> --conf spark.driver.host=<MASTER-POD-IP>
And then run:
sc.parallelize([1,2,3,4,5,6]).collect()
That should work.
Alternatively, I recommend you fork my repository:
https://github.com/Thelin90/deiteo
And try to run it with my instructions there, you might find if you done something different from me there.
I have automated the whole process with
Makefile
so should just be you pressing enter more or less, and if that does not work you can make an issue in therepo
.
This also worked.
MASTER-POD is the Spark master pod name
MASTER-IP is the Spark master IP found in kubectl get pods -o wide
kubectl exec <MASTER-POD> -it -- pyspark --conf spark.driver.bindAddress=<MASTER-IP> --conf spark.driver.host=<MASTER-IP>
I was inspired by this repository, and continue to build on it.
However, I also got the issue faced here: https://github.com/testdrivenio/spark-kubernetes/issues/1
I was getting:
I bashed my head around this for 2 nights, not being an expert in
K8S
I first thought something was wrong with how I started it up.Either way, this is how I reproduced the problem:
1)
I checked my resources, and I made the following config:
spark-defaults.conf
2)
And I ran
minikube
with:3)
These were my
spark-master
andspark-worker
scripts:spark-worker.sh
### Note I put
2g
here just to be 100% confident I was not using to much resources.spark-worker.sh
4)
I then ran:
And error occurred!
I made sure to get access to
8081
and4040
to investigate logs further:I then went in and:
5)
I scratched my head, and I knew! I have enough resources, why does this not work!
And I could see:
I then thought well, I done this right:
The docs mention that it can be either
HOST
orIP
, I am good I thought. I saw the possible solution of:Well this was not a problem for me, actually I had no
iptables
to resolve at all.So I then verified the
master
IP
with:I then took the
MASTER-IP
and added it directly:6)
SOLUTION:
spark-defaults.conf
And add the
IPs
correctly:spark-worker.sh
_In this case my
SPARK_HOME
is/usr/local/spark
_My
Dockerfile
Currently bulding a streaming platform in this repo:
https://github.com/Thelin90/deiteo