Open michalzxc opened 4 years ago
If it is really spark-operator getting mad I consider to take away its rback right to touch webhook
It didnt like not beeing able to touch webhook:
F0512 15:06:30.565758 9 main.go:199] mutatingwebhookconfigurations.admissionregistration.k8s.io "spark-operator-sparkoperator-webhook-config" is forbidden: User "system:serviceaccount:spark-operator:spark-operator-sparkoperator" cannot get resource "mutatingwebhookconfigurations" in API group "admissionregistration.k8s.io" at the cluster scope
It was gone again, it seems it is mostly happen during/after new release. We have helm chart with ~20 sparks jobs, some new pods got mutated other didn't because webhook was already gone.
May 12 15:04:09.417 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:04:09.417490 9 controller.go:113] Stopping the ScheduledSparkApplication controller
May 12 15:04:09.417 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:04:09.417476 9 controller.go:171] Stopping the SparkApplication controller
May 12 15:04:09.417 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:04:09.417449 9 main.go:225] Shutting down the Spark Operator
May 12 15:04:04.177 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:04:04.177502 9 controller.go:164] Syncing ScheduledSparkApplication spark/dam-user-library-batch
May 12 15:04:04.135 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:04:04.135143 9 submission.go:63] spark-submit arguments: [/opt/spark/bin/spark-submit --master k8s://https://172.20.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=spark --conf spark.app.name=dam-arm-outcomes --conf spark.kubernetes.driver.pod.name=dam-arm-outcomes-driver --jars /usr/external_jars/spark_streaming_kafka_assembly.jar --conf spark.kubernetes.container.image=122558522240.dkr.ecr.eu-west-1.amazonaws.com/dam:18854a72f89d69f258426f484b4c200dff553470 --conf spark.kubernetes.container.image.pullPolicy=Always --conf spark.kubernetes.pyspark.pythonVersion=3 --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.streaming.driver.writeAheadLog.batchingTimeout=15000 --conf spark.executor.heartbeatInterval=60s --conf spark.network.timeout=900s --conf spark.streaming.receiver.writeAheadLog.closeFileAfterWrite=true --conf spark.streaming.backpressure.enabled=true --conf spark.kubernetes.driver.secrets.dam=/opt/spark/conf/envs --conf spark
May 12 15:04:04.135 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:04:04.135041 9 controller.go:258] Starting processing key: "spark/dam-arm-outcomes"
May 12 15:04:04.134 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:04:04.134998 9 controller.go:218] SparkApplication spark/dam-arm-outcomes was updated, enqueueing it
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator } map[] 0 1}]
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN Please initialize the log4j system properly.
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator 20/05/12 14:03:19 INFO ShutdownHookManager: Deleting directory /tmp/spark-be44c4fc-516b-499b-8397-5daa0b46b0b6
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator 20/05/12 14:03:19 INFO ShutdownHookManager: Shutdown hook called
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator ... 50 more
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.Okio$2.read(Okio.java:139)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.InputRecord.read(InputRecord.java:503)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.read(SocketInputStream.java:141)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.read(SocketInputStream.java:171)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.socketRead0(Native Method)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator Caused by: java.net.SocketTimeoutException: Read timed out
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator ... 17 more
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:326)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:796)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:234)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.RealCall.execute(RealCall.java:69)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.318 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.Okio$4.newTimeoutException(Okio.java:230)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator Caused by: java.net.SocketTimeoutException: timeout
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:322)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:329)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [spark] failed.
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN Please initialize the log4j system properly.
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN Please initialize the log4j system properly.
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN Please initialize the log4j system properly.
May 12 15:03:41.250 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator } map[] 0 1}]
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator 20/05/12 14:03:19 INFO ShutdownHookManager: Deleting directory /tmp/spark-be44c4fc-516b-499b-8397-5daa0b46b0b6
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator 20/05/12 14:03:19 INFO ShutdownHookManager: Shutdown hook called
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator ... 50 more
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.AsyncTimeout$2.read(AsyncTimeout.java:237)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.Okio$2.read(Okio.java:139)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.InputRecord.read(InputRecord.java:503)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.read(SocketInputStream.java:141)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.read(SocketInputStream.java:171)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at java.net.SocketInputStream.socketRead0(Native Method)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator Caused by: java.net.SocketTimeoutException: Read timed out
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator ... 17 more
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:326)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:796)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:234)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.RealCall.execute(RealCall.java:69)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.AsyncTimeout$2.read(AsyncTimeout.java:241)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.AsyncTimeout.exit(AsyncTimeout.java:285)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at okio.Okio$4.newTimeoutException(Okio.java:230)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator Caused by: java.net.SocketTimeoutException: timeout
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:322)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:329)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
May 12 15:03:41.183 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
May 12 15:03:40.553 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.553998 9 spark_pod_eventhandler.go:77] Pod dam-lp-responses-driver deleted in namespace spark.
May 12 15:03:40.553 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.553992 9 spark_pod_eventhandler.go:58] Pod dam-lp-responses-driver updated in namespace spark.
May 12 15:03:40.553 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.553977 9 spark_pod_eventhandler.go:58] Pod dam-lp-responses-driver updated in namespace spark.
May 12 15:03:40.553 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.553970 9 spark_pod_eventhandler.go:58] Pod dam-lp-responses-driver updated in namespace spark.
May 12 15:03:40.553 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.553948 9 spark_pod_eventhandler.go:47] Pod dam-lp-responses-driver added in namespace spark.
May 12 15:03:40.553 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.553792 9 spark_pod_eventhandler.go:58] Pod dam-redshift-sink-driver updated in namespace spark.
May 12 15:03:40.533 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.533659 9 controller.go:218] SparkApplication spark/dam-event-tag-views was updated, enqueueing it
May 12 15:03:40.532 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.532800 9 spark_pod_eventhandler.go:58] Pod dam-redshift-sink-driver updated in namespace spark.
May 12 15:03:40.532 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.532506 9 controller.go:218] SparkApplication spark/dam-event-tag-views was updated, enqueueing it
May 12 15:03:40.532 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.532281 9 controller.go:218] SparkApplication spark/dam-event-tag-views was updated, enqueueing it
May 12 15:03:40.459 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.459016 9 controller.go:218] SparkApplication spark/dam-lp-responses was updated, enqueueing it
May 12 15:03:40.458 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.458784 9 controller.go:218] SparkApplication spark/dam-lp-responses was updated, enqueueing it
May 12 15:03:40.457 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.457869 9 controller.go:218] SparkApplication spark/dam-lp-responses was updated, enqueueing it
May 12 15:03:40.456 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.456446 9 controller.go:218] SparkApplication spark/dam-lp-responses was updated, enqueueing it
May 12 15:03:40.456 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.456125 9 controller.go:218] SparkApplication spark/dam-lp-responses was updated, enqueueing it
May 12 15:03:40.454 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.454159 9 controller.go:218] SparkApplication spark/dam-redshift-sink was updated, enqueueing it
May 12 15:03:40.453 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.453321 9 controller.go:218] SparkApplication spark/dam-event-tag-views was updated, enqueueing it
May 12 15:03:40.452 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.452434 9 controller.go:218] SparkApplication spark/dam-event-views was updated, enqueueing it
May 12 15:03:40.451 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.451493 9 controller.go:218] SparkApplication spark/dam-event-views was updated, enqueueing it
May 12 15:03:40.445 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.445782 9 spark_pod_eventhandler.go:47] Pod dam-redshift-sink-driver added in namespace spark.
May 12 15:03:40.445 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.445706 9 spark_pod_eventhandler.go:58] Pod dam-analytics-driver updated in namespace spark.
May 12 15:03:40.259 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.259722 9 spark_pod_eventhandler.go:58] Pod dam-analytics-driver updated in namespace spark.
May 12 15:03:40.259 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.259714 9 spark_pod_eventhandler.go:47] Pod dam-analytics-driver added in namespace spark.
May 12 15:03:40.259 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.259707 9 spark_pod_eventhandler.go:95] Enqueuing SparkApplication spark/dam-sessions for app update processing.
May 12 15:03:40.259 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.259700 9 spark_pod_eventhandler.go:58] Pod dam-sessions-driver updated in namespace spark.
May 12 15:03:40.259 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.259684 9 spark_pod_eventhandler.go:95] Enqueuing SparkApplication spark/dam-sessions for app update processing.
May 12 15:03:40.259 prod001static01-313411262324660.ad.dice.fm spark-operator-sparkoperator I0512 14:03:40.259661 9 spark_pod_eventhandler.go:58] Pod dam-sessions-driver updated in namespace spark.
I removed rbac permission to delete webhook from operator and it "solves" practical aspect of problem but not core issue tho
@michalzxc Thanks we also see the same behaviour in our cluster.
Hello here :wave: We also face it from time to time, we don't know how to reproduce. We saw in kube-apiserver logs DELETE mutatingwebhookconfigurations statements. As we understand only delete actions in spark-operator come when the application shutdown. Sometime, when it shutdowns, it appears at "Running" but stay in the shutdown process without restarting (infinite loop in hook.Stop() ?), no clue why it shutdown at first place...
Hi,
as a temporary fix I've implemented a livenessProbe for the Deployment so it checks if mutating webhook has a mismatch and restarts container to refresh certificates and match them together. Seems to be working for now
livenessProbe:
initialDelaySeconds: 1
periodSeconds: 1
failureThreshold: 1
exec:
command:
- sh
- -c
- |
set -e
curl -iks -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
https://kubernetes.default.svc/apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations/{{ include "spark-operator.fullname" . }}-webhook-config \
| grep -o '"caBundle": "[^"]*"' \
| awk -F'"' '{print $4}' \
| base64 -d > /tmp/expected_ca_bundle.crt
expected_ca_bundle=$(cat /etc/webhook-certs/ca-cert.pem)
actual_ca_bundle=$(cat /tmp/expected_ca_bundle.crt)
if [ "$expected_ca_bundle" != "$actual_ca_bundle" ]; then
exit 1
fi
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi, I don't know how to really debug it, but our Spark Operator webhook is randomly dissapearing It is there:
And later we see pods crashing in spark namespace and when we check it is gone.
Today it happen after one hour, not really sure how to debug Our helm values.yaml: