kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.8k stars 1.38k forks source link

Unable to run a job with main file point to s3 bucket #2301

Open nownikhil opened 3 weeks ago

nownikhil commented 3 weeks ago

What question do you want to ask?

Error

│ Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found                          │
│   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)                                                                              │
│   at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)                                                                            │
│   at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)                                                                              │
│   at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)                                                                                     │
│   at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)                                                                             │
│   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)                                                                                     │
│   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)                                                                                            │
│   at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1831)                                                                                  │
│   at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:727)                                                                                           │
│   at org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264)                                                                      │
│   at org.apache.spark.deploy.k8s.KubernetesUtils$.loadPodFromTemplate(KubernetesUtils.scala:103)                                                         │
│   ... 18 more                                                                                                                                            │
│ Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found                                                      │
│   at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2592)                                                                        │
│   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)                                                                              │

Additional context

This is happening because spark-operator image doesn't have hadoop-aws jar. Is there a recommended way to pull jars from S3?

No response

Have the same question?

Give it a 👍 We prioritize the question with most 👍

ujjawal-khare commented 1 week ago

Facing the same issue, the only solution that I found was to bake the hadoop jar inside the operator but not sure if this is the right way as we would be having dynamic jars coming in.