We set the experimental spark.driver.userClassPathFirst and spark.executor.userClassPathFirst configs to true.
This causes Classpath issues once you pull in Java dependencies.
The problem is that there is no right or wrong way of doing things... Some jobs need this to be enabled, some need it to be disabled. We might just want to document the current state.
:: resolution report :: resolve 5610ms :: artifacts dl 1730ms
:: modules in use:
com.google.code.findbugs#jsr305;3.0.0 from central in [default]
commons-logging#commons-logging;1.1.3 from central in [default]
org.apache.commons#commons-pool2;2.11.1 from central in [default]
org.apache.hadoop#hadoop-client-api;3.3.4 from central in [default]
org.apache.hadoop#hadoop-client-runtime;3.3.4 from central in [default]
org.apache.iceberg#iceberg-spark-runtime-3.4_2.12;1.4.0 from central in [default]
org.apache.kafka#kafka-clients;3.3.2 from central in [default]
org.apache.spark#spark-sql-kafka-0-10_2.12;3.4.0 from central in [default]
org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.4.0 from central in [default]
org.lz4#lz4-java;1.8.0 from central in [default]
org.slf4j#slf4j-api;2.0.6 from central in [default]
org.xerial.snappy#snappy-java;1.1.9.1 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 12 | 12 | 12 | 0 || 12 | 12 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-69012960-2916-4d39-9ea8-c688cb61be81
confs: [default]
12 artifacts copied, 0 already retrieved (84990kB/127ms)
SLF4J: A SLF4J service provider failed to instantiate:
org.slf4j.spi.SLF4JServiceProvider: org.apache.logging.slf4j.SLF4JServiceProvider not a subtype
SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not org.apache.hadoop.security.GroupMappingServiceProvider
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2720)
at org.apache.hadoop.security.Groups.<init>(Groups.java:107)
at org.apache.hadoop.security.Groups.<init>(Groups.java:102)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:451)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:338)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3746)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:3736)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3520)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.spark.util.DependencyUtils$.resolveGlobPath(DependencyUtils.scala:317)
at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2(DependencyUtils.scala:273)
at org.apache.spark.util.DependencyUtils$.$anonfun$resolveGlobPaths$2$adapted(DependencyUtils.scala:271)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
at org.apache.spark.util.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:271)
at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$4(SparkSubmit.scala:390)
at scala.Option.map(Option.scala:230)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:390)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not org.apache.hadoop.security.GroupMappingServiceProvider
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2714)
... 31 more
Affected version
No response
Current and expected behavior
We set the experimental
spark.driver.userClassPathFirst
andspark.executor.userClassPathFirst
configs to true. This causes Classpath issues once you pull in Java dependencies.The problem is that there is no right or wrong way of doing things... Some jobs need this to be enabled, some need it to be disabled. We might just want to document the current state.
Possible solution
Don't set these experimental features.
Additional context
When dynamically loading extensions, like this:
An error occurs:
Environment
No response
Would you like to work on fixing this bug?
None