scalapb / sparksql-scalapb

SparkSQL utils for ScalaPB
Apache License 2.0
43 stars 28 forks source link

RemoteClassLoaderError happening inconsistently #383

Open zerohun opened 6 months ago

zerohun commented 6 months ago

Env: scalaVersion := "2.12.15" sparkVersion = "3.3.2" // Databricks runtime 12.2 "com.thesamet.scalapb" %% "sparksql33-scalapb0_11" % "1.0.4"

Stack trace:

Job aborted due to stage failure: <root>/package.class
Caused by: RemoteClassLoaderError: <root>/package.class
Caused by: ClosedByInterruptException: 
    at org.apache.spark.repl.ExecutorClassLoader$$anon$1.toClassNotFound(ExecutorClassLoader.scala:156)
    at org.apache.spark.repl.ExecutorClassLoader$$anon$1.read(ExecutorClassLoader.scala:143)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at org.apache.spark.repl.ExecutorClassLoader.readAndTransformClass(ExecutorClassLoader.scala:263)
    at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:217)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:115)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:596)
    at scala.reflect.runtime.JavaMirrors$JavaMirror.tryJavaClass(JavaMirrors.scala:600)
    at scala.reflect.runtime.SymbolLoaders$PackageScope.$anonfun$lookupEntry$1(SymbolLoaders.scala:146)
    at scala.reflect.runtime.SymbolLoaders$PackageScope.syncLockSynchronized(SymbolLoaders.scala:133)
    at scala.reflect.runtime.SymbolLoaders$PackageScope.lookupEntry(SymbolLoaders.scala:135)
    at scala.reflect.internal.tpe.FindMembers$FindMemberBase.walkBaseClasses(FindMembers.scala:110)
    at scala.reflect.internal.tpe.FindMembers$FindMemberBase.searchConcreteThenDeferred(FindMembers.scala:75)
    at scala.reflect.internal.tpe.FindMembers$FindMemberBase.apply(FindMembers.scala:55)
    at scala.reflect.internal.Types$Type.$anonfun$findMember$1(Types.scala:1043)
    at scala.reflect.internal.Types$Type.findMemberInternal$1(Types.scala:1041)
    at scala.reflect.internal.Types$Type.findMember(Types.scala:1046)
    at scala.reflect.internal.Types$Type.memberBasedOnName(Types.scala:672)
    at scala.reflect.internal.Types$Type.member(Types.scala:636)
    at scala.reflect.internal.Types$Type.packageObject(Types.scala:648)
    at scala.reflect.internal.Symbols$Symbol.packageObject(Symbols.scala:859)
    at scala.reflect.internal.SymbolTable.openPackageModule(SymbolTable.scala:405)
    at scala.reflect.runtime.SymbolLoaders$LazyPackageType.$anonfun$complete$3(SymbolLoaders.scala:83)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at scala.reflect.internal.SymbolTable.slowButSafeEnteringPhaseNotLaterThan(SymbolTable.scala:333)
    at scala.reflect.runtime.SymbolLoaders$LazyPackageType.complete(SymbolLoaders.scala:80)
    at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1551)
    at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anon$2.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(JavaMirrors.scala:81)
    at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.$anonfun$info$1(SynchronizedSymbols.scala:158)
    at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info(SynchronizedSymbols.scala:149)
    at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info$(SynchronizedSymbols.scala:158)
    at scala.reflect.runtime.JavaMirrors$JavaMirror$$anon$2.info(JavaMirrors.scala:81)
    at scala.reflect.internal.Mirrors$RootsBase.init(Mirrors.scala:258)
    at scala.reflect.runtime.JavaMirrors.createMirror(JavaMirrors.scala:47)
    at scala.reflect.runtime.JavaMirrors.$anonfun$runtimeMirror$1(JavaMirrors.scala:64)
    at scala.reflect.runtime.JavaMirrors.runtimeMirror(JavaMirrors.scala:62)
    at scala.reflect.runtime.JavaMirrors.runtimeMirror$(JavaMirrors.scala:61)
    at scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:30)
    at scala.reflect.runtime.JavaUniverse.runtimeMirror(JavaUniverse.scala:30)
    at org.apache.spark.sql.catalyst.ScalaReflection$.mirror(ScalaReflection.scala:86)
    at org.apache.spark.sql.catalyst.ScalaReflection$.mirror(ScalaReflection.scala:58)
    at org.apache.spark.sql.catalyst.ScalaReflection.localTypeOf(ScalaReflection.scala:873)
    at org.apache.spark.sql.catalyst.ScalaReflection.localTypeOf$(ScalaReflection.scala:871)
    at org.apache.spark.sql.catalyst.ScalaReflection$.localTypeOf(ScalaReflection.scala:58)
    at org.apache.spark.sql.reflection.package$.dataTypeFor(package.scala:56)
    at shadeframeless.TypedEncoder$$anon$16.jvmRepr(TypedEncoder.scala:285)
    at shadeframeless.RecordEncoder.$anonfun$toCatalyst$2(RecordEncoder.scala:154)
    at scala.collection.immutable.List.map(List.scala:297)
    at shadeframeless.RecordEncoder.toCatalyst(RecordEncoder.scala:153)
    at shadeframeless.TypedExpressionEncoder$.apply(TypedExpressionEncoder.scala:28)
    at shadescalapb.spark.Implicits.typedEncoderToEncoder(TypedEncoders.scala:119)
    at shadescalapb.spark.Implicits.typedEncoderToEncoder$(TypedEncoders.scala:116)
    at shadescalapb.spark.Implicits$.typedEncoderToEncoder(TypedEncoders.scala:122)
    at com.udemy.models.WrappedPurchaseEvent$.<init>(WrappedPurchaseEvent.scala:49)
    at com.udemy.models.WrappedPurchaseEvent$.<clinit>(WrappedPurchaseEvent.scala)

WrappedPurchaseEvent is a case class that has a protobuf model as a object

case class WrappedPurchaseEvent(
    purchaseEvent: PurchaseEvent, // Protobuf object
    transactionTime: Timestamp,
    partitionValue: String
)

This error is happening for reading a hive table as a Dataset[WrappedPurchaseEvent]

ex)

spark.table(PURCHASE_EVENT_TABLE_NAME).as(typedEncoderToEncoder[WrappedPurchaseEvent])

I'm seeing this error while I'm trying to run it in Databricks. Have anybody experienced this issue? This issue is happening inconsistently. It runs successfully sometime.

thesamet commented 6 months ago

This seems related to your environment and unrelated to ScalaPB, and potentially as a result of some other problem in the Spark job. I haven't come across similar bug reports before. Can you create a minimal project that reproduces this problem, even not consistently?