Open glitch-k8s opened 3 years ago
@nishantsh77 could you share your sparkapplication manifest.
@shardulsrivastava spark manifest file. I redacted some data..Hope its fine
apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: test-aws namespace: test labels: name: ... spec: type: Java mode: cluster image: "..." imagePullSecrets:
Please let me know if any more data points are required.
I am facing problem in AWS only. In non-cloud infra its working fine.
spark operator version :
@yuchaoran2011 Please help or shall I migrate to latest version of Spark operator ?
@nishantsh77 I'm not working on a Spark-related project at the moment. So unfortunately I'm not able to help you look into the issue. Do upgrade the the latest operator and see if the problem still persists.
@yuchaoran2011 Thanks. I tried even latest version of Spark operator but its not working on AWS EKS.
@liyinan926 I was wondering if you can help on this.
Getting exception in sparkoperator during volume mount..
I1214 17:00:00.494155 9 webhook.go:246] Serving admission request 2020/12/14 17:00:00 http: panic serving 10.1.22.119:57744: runtime error: index out of range [1] with length 1 goroutine 149 [running]: net/http.(conn).serve.func1(0xc000256000) /usr/local/go/src/net/http/server.go:1772 +0x139 panic(0x137a660, 0xc0006234c0) /usr/local/go/src/runtime/panic.go:973 +0x396 github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook.addVolumeMount(0xc000a80a80, 0xc000503090, 0xf, 0x0, 0xc000503080, 0xb, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook/patch.go:177 +0x57f github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook.addVolumes(0xc000a80a80, 0xc0006de000, 0x142ebf9, 0xa, 0xc0002e9ba8) /go/src/github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook/patch.go:144 +0x570 github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook.patchSparkPod(0xc000a80a80, 0xc0006de000, 0x17, 0xc0006de000, 0x0) /go/src/github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook/patch.go:52 +0xc5 github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook.mutatePods(0xc0004b13b0, 0x160b060, 0xc00032afe0, 0x7fff47d5ea51, 0xa, 0x160c4e0, 0xc0004b13b0, 0x160c4e0) /go/src/github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook/webhook.go:554 +0x591 github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook.(WebHook).serve(0xc00047e000, 0x1625460, 0xc000a62700, 0xc0006c2300) /go/src/github.com/GoogleCloudPlatform/spark-on-k8s-operator/pkg/webhook/webhook.go:278 +0xb1a net/http.HandlerFunc.ServeHTTP(0xc00032aff0, 0x1625460, 0xc000a62700, 0xc0006c2300) /usr/local/go/src/net/http/server.go:2012 +0x44 net/http.(ServeMux).ServeHTTP(0xc0000bce00, 0x1625460, 0xc000a62700, 0xc0006c2300) /usr/local/go/src/net/http/server.go:2387 +0x1a5 net/http.serverHandler.ServeHTTP(0xc0002f0380, 0x1625460, 0xc000a62700, 0xc0006c2300) /usr/local/go/src/net/http/server.go:2807 +0xa3 net/http.(conn).serve(0xc000256000, 0x162b7e0, 0xc0001a3a80) /usr/local/go/src/net/http/server.go:1895 +0x86c created by net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2933 +0x35c I1214 17:00:00.505935 9 spark_pod_eventhandler.go:47] Pod test-1607965184791-exec-1 added in namespace aws-test.
So you are trying to mount the same volume into both the driver and executor pods?
@liyinan926 Yes, I need to access same path from both driver and executor(s). If any more details are required, please let me know.
Observation:
Kindly guide me, how to resolve it. Is it AWS specific issue ?
Although this isn't an answer to your question, be aware that EFS volumes have a limited IOPS budget depending on the size of the volume. If it is used for more than configuration data, the executors might become constrained by operations to that volume.
@jkleckner Thanks for suggestion. Finally, few hours back it worked. Chart version 0.8.2 is working now. My observation is that sparkoperator works in some environments seamlessly and in some it creates lot of problems.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hello,
I am running spark operator on AWS and somehow EFS volume is not getting mounted on executors. While, its happening perfectly fine in case of driver.
I am really stuck on this. Any help/pointers for this.
Regards, Nishant