Closed sridhar closed 2 years ago
The issue goes away when I manually copy the spark-avro jar to the executors. This should have been automatic if I'm doing the same for the main tar:
.config("spark.jars","lib/isolation-forest_2.4.3_2.11-2.0.6.jar,lib/spark-avro_2.11-2.4.3.jar")
The spark-avro
module is external and isn't included by default.
https://spark.apache.org/docs/latest/sql-data-sources-avro.html
As you point out, you can add the jar manually or specify the coordinates.
spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.3,com.linkedin.isolation-forest:isolation-forest_2.4.3_2.11:2.0.4
./gradlew clean build -x test -PsparkVersion=2.4.3 -PscalaVersion=2.11.12
spark-2.4.3-hadoop2.6/sbin/start-all.sh
The above run fails on model save stage with the following error:
Now since spark-avro has already been compiled in the linkedin jar, I shouldn't have to include it again. But I added I tried multiple things;
I'm using the Top of the tree source. Am I missing something here? Do I need to do something specific here? Is there a bug in the packaging script of the library?
Note that the above code works without the save invocation.