Open grajee-everest opened 2 years ago
I would really try to not download and add the JARs manually, but use Maven's package resolution. What was the error you got there?
I got the same error when I first tried it with Maven Coordinates as in the screenshot below. Seeing this error, I went through the dependencies at the link and manually loaded the jar files hoping that it would help. But it did not.
Here is the error that got generated:
Py4JJavaError Traceback (most recent call last)
in 1 #sampleDataFilePath = "dbfs:/FileStore/tables/users.xls" 2 ----> 3 df = spark.read.format("excel") \ 4 .option("header", True) \ 5 .option("inferSchema", True) \ /databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options) 202 self.options(**options) 203 if isinstance(path, str): --> 204 return self._df(self._jreader.load(path)) 205 elif path is not None: 206 if type(path) != list: /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306 /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 115 def deco(*a, **kw): 116 try: --> 117 return f(*a, **kw) 118 except py4j.protocol.Py4JJavaError as e: 119 converted = convert_exception(e.java_exception) /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client) 325 if answer[1] == REFERENCE_TYPE: --> 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o307.load. **: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B** at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104) at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream. (UnsynchronizedByteArrayOutputStream.java:51) at shadeio.poi.util.IOUtils.peekFirstNBytes(IOUtils.java:110) at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:107) at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:122) at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:69) at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:42) at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69) at scala.Option.orElse(Option.scala:447) at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69) at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63) at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82) at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80) at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81) at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:388) at scala.Option.map(Option.scala:230) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
If it helps these are the jar files that I see running for the session:
%scala
spark.sparkContext.listJars.foreach(println)
spark://xx.xxx.xx.xx:40525/jars/addedFile2184961893124998763poi_shared_strings_2_2_3-55036.jar spark://xx.xxx.xx.xx:40525/jars/addedFile8325949175049880530poi_5_1_0-6eaa4.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3805277380370442712spoiwo_2_12_2_0_0-5d426.jar spark://xx.xxx.xx.xx:40525/jars/addedFile4821020784640732815commons_text_1_9-9ec33.jar spark://xx.xxx.xx.xx:40525/jars/addedFile6096385456097086834commons_collections4_4_4-86bd5.jar spark://xx.xxx.xx.xx:40525/jars/addedFile5503460718089690954poi_ooxml_5_1_0-dcd47.jar spark://xx.xxx.xx.xx:40525/jars/addedFile1801717094295843813commons_compress_1_21-ae1b7.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3469926387869248457h2_1_4_200-17cf6.jar spark://xx.xxx.xx.xx:40525/jars/addedFile7124418099051517404curvesapi_1_6-ef037.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3524630059114379065slf4j_api_1_7_32-db310.jar spark://xx.xxx.xx.xx:40525/jars/addedFile621063403924903495SparseBitSet_1_2-c8237.jar spark://xx.xxx.xx.xx:40525/jars/addedFile5513775878198382075commons_io_2_11_0-998c5.jar spark://xx.xxx.xx.xx:40525/jars/addedFile6021795642522535665spark_excel_2_12_3_1_2_0_15_1-54852.jar spark://xx.xxx.xx.xx:40525/jars/addedFile2561448775843921624poi_ooxml_lite_5_1_0-a9fef.jar spark://xx.xxx.xx.xx:40525/jars/addedFile1605810761903966851commons_lang3_3_11-82b59.jar spark://xx.xxx.xx.xx:40525/jars/addedFile2616706435049414994commons_codec_1_15-3e3d3.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3670030969644712160log4j_api_2_14_1-5a13d.jar spark://xx.xxx.xx.xx:40525/jars/addedFile6859359805595503404xmlbeans_5_0_2-db545.jar spark://xx.xxx.xx.xx:40525/jars/addedFile5420236778608197626scala_xml_2_12_2_0_0-e8c94.jar spark://xx.xxx.xx.xx:40525/jars/addedFile2437294818883127996commons_math3_3_6_1-876c0.jar spark://xx.xxx.xx.xx:40525/jars/addedFile3167668463888463121excel_streaming_reader_3_2_3-b4c68.jar
FYI - I was able to get the elastacloud module working
Wow, I didn't even know that project. Thanks for pointing it out!
I would like to still get this working on Databricks and SQLServer BDC. Note that it is not working in Databricks for few others as well.
I am also facing similar issue. Is it resolved? java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
FWIW I'm getting the exact same error: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
Azure Databricks: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12). Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.15.1
Any help would be greatly appreciated @nightscape
Hello Guys, I am also facing the same issue. Is there any solution or did someone found any alternative method?
@grajee-everest @KanakVantaku @soleyjh @MounikaKolisetty could one of you try if changing https://github.com/crealytics/spark-excel/blob/main/build.sbt#L32-L37 to
shadeRenames ++= Seq(
"org.apache.poi.**" -> "shadeio.poi.@1",
"spoiwo.**" -> "shadeio.spoiwo.@1",
"com.github.pjfanning.**" -> "shadeio.pjfanning.@1",
"org.apache.commons.io.**" -> "shadeio.commons.io.@1",
"org.apache.commons.compress.**" -> "shadeio.commons.compress.@1"
)
resolves this problem?
You would need to sbt publishLocal
a version of the JAR that you upload to Databricks then.
Hello @nightscape , What does sbt publishLocal means? Can you please explain me in detail?
Hi @MounikaKolisetty,
SBT is the Scala (or Simple) Build Tool. You can get instructions on how to install it here: https://www.scala-sbt.org/ Once you have it installed, you should be able to run
cd /path/to/spark-excel
# Make the changes from above
sbt publishLocal
This should start building the project and copy the generated JAR files to a path like ~/.iyv2/.../spark-excel...jar
.
You can take the JAR from there and try to upload and use it in Databricks.
same error here : java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
Azure Databricks: 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12). Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.15.1
Any solution yet?
Same error.
Azure Databricks: 9.0 (includes Apache Spark 3.1.2, Scala 2.12). Maven Install from Databricks: com.crealytics:spark-excel_2.12:3.1.2_0.16.0
@ciaeric @Dunehub if possible, please try my proposal in https://github.com/crealytics/spark-excel/issues/467#issuecomment-984506825
I am also facing same issue on Azure Databricks and looking for possible solutions. Adding dependencies as per attachment but not working
Once the build here finishes successfully, you can try version 0.16.1-pre1
:
https://github.com/crealytics/spark-excel/actions/runs/1607777770
@ciaeric @Dunehub @spaw6065 @MounikaKolisetty @grajee-everest Please provide feedback here if the shading worked. The change is still on a branch, so if you don't provide feedback, it won't get merged and won't be part of the next release.
To me it still didn't worked
Steps followed :-
Added jar from dbfs
executed sample code
Error trace
NoClassDefFoundError: shadeio/commons/io/output/UnsynchronizedByteArrayOutputStream Caused by: ClassNotFoundException: shadeio.commons.io.output.UnsynchronizedByteArrayOutputStream at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55) at scala.Option.fold(Option.scala:251) at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55) at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16) at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15) at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) at scala.Option.getOrElse(Option.scala:189) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:390) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:346) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:346) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:3) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:47) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:49) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw.<init>(command-2541019815824441:51) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw.<init>(command-2541019815824441:53) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw.<init>(command-2541019815824441:55) at $line03e3e0503061413eab90de3bf6be643427.$read.<init>(command-2541019815824441:57) at $line03e3e0503061413eab90de3bf6be643427.$read$.<init>(command-2541019815824441:61) at $line03e3e0503061413eab90de3bf6be643427.$read$.<clinit>(command-2541019815824441) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print$lzycompute(<notebook>:7) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print(<notebook>:6) at $line03e3e0503061413eab90de3bf6be643427.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:219) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:235) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:902) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:855) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:235) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$13(DriverLocal.scala:541) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:518) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:689) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:681) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:522) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:634) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:427) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:370) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:221) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: shadeio.commons.io.output.UnsynchronizedByteArrayOutputStream at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:206) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:172) at com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:55) at scala.Option.fold(Option.scala:251) at com.crealytics.spark.excel.DefaultWorkbookReader.openWorkbook(WorkbookReader.scala:55) at com.crealytics.spark.excel.WorkbookReader.withWorkbook(WorkbookReader.scala:16) at com.crealytics.spark.excel.WorkbookReader.withWorkbook$(WorkbookReader.scala:15) at com.crealytics.spark.excel.DefaultWorkbookReader.withWorkbook(WorkbookReader.scala:50) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:32) at com.crealytics.spark.excel.ExcelRelation.headerColumns$lzycompute(ExcelRelation.scala:104) at com.crealytics.spark.excel.ExcelRelation.headerColumns(ExcelRelation.scala:103) at com.crealytics.spark.excel.ExcelRelation.$anonfun$inferSchema$1(ExcelRelation.scala:172) at scala.Option.getOrElse(Option.scala:189) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:171) at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:36) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:13) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:385) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:390) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:346) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:346) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:237) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:3) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:47) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw$$iw.<init>(command-2541019815824441:49) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw$$iw.<init>(command-2541019815824441:51) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw$$iw.<init>(command-2541019815824441:53) at $line03e3e0503061413eab90de3bf6be643427.$read$$iw.<init>(command-2541019815824441:55) at $line03e3e0503061413eab90de3bf6be643427.$read.<init>(command-2541019815824441:57) at $line03e3e0503061413eab90de3bf6be643427.$read$.<init>(command-2541019815824441:61) at $line03e3e0503061413eab90de3bf6be643427.$read$.<clinit>(command-2541019815824441) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print$lzycompute(<notebook>:7) at $line03e3e0503061413eab90de3bf6be643427.$eval$.$print(<notebook>:6) at $line03e3e0503061413eab90de3bf6be643427.$eval.$print(<notebook>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:747) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1020) at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:568) at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:36) at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:116) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:567) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:594) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:564) at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:219) at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:235) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:902) at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:855) at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:235) at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$13(DriverLocal.scala:541) at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:266) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62) at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:261) at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:258) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50) at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:305) at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:297) at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50) at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:518) at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:689) at scala.util.Try$.apply(Try.scala:213) at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:681) at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:522) at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:634) at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:427) at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:370) at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:221) at java.lang.Thread.run(Thread.java:748)
Any luck i am facing the same issue. with latest jar com.crealytics:spark-excel_2.13:3.2.0_0.16.1-pre1. Py4JJavaError Traceback (most recent call last)
I've also faced this error? is there any solution? Is there another way to read an xls file on Databricks?
Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.
NoClassDefFoundError: shadeio/commons/io/output/UnsynchronizedByteArrayOutputStream Caused by: ClassNotFoundException
Getting the same error in aws glue with Glue 3.0 Spark 3.1 version - using pyspark.
I'm looking for the new version as I want to utilize the dynamic partitioning to create multiple excel files in parallel
FYI, getting same error even with scala - AWS Glue 3.0 Scala 2 Spark 3.1
Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.
This clue works for me!!
Thanks
but I need the dynamic partitioning feature, which is not available in the older version.
As I don't have time to look into this, your best option is to try different versions of shading deps here. Once you have found a version that works on AWS / Azure Databricks, I'd be happy to do another pre-release and get it merged if it works for everyone.
This
Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.
I was facing the same issue and this worked. Thank you
This
Faced the same error with the 0.16 and 0.16.1 versions of this library. But then I tried an older version (com.crealytics:spark-excel_2.12:0.14.0) and it is working like a charm now.
I was facing the same issue and this worked. Thank you
This worked for me .Thank you .
Can you give v0.16.1-pre2
a try?
Seem GitHub action for v0.16.1-pre2
has failed
Right. I'm not sure where the GPG problem is coming from and if it has anything to do with the rather small changes I made to build.sbt
.
I had a lot of trouble running spark-excel, because of incompatible dependencies. Finally with these library versions I was able to write and read excel! I share for you.
Im experiencing the same issue. Im using the following versions: Scala: 2.1.2 spark-excel: 0.14.0 Spark: 3.1.2
Ive tried different versions such as: 0.13.1 0.13.4 0.14.0 0.15.0 0.16.1-pre1
Has anybody found a solution to this?
As I don't have time to look into this, your best option is to try different versions of shading deps here. Once you have found a version that works on AWS / Azure Databricks, I'd be happy to do another pre-release and get it merged if it works for everyone.
Hi,
I took "pre2" code and built a local jar file and deployed in data bricks. Now I am not getting "commons-io" errors. But now, I am getting a different error.
IOException: Your InputStream was neither an OLE2 stream, nor an OOXML stream or you haven't provide the poi-ooxml*.jar in the classpath/modulepath - FileMagic: OOXML, having providers: [] at shadeio.poi.ss.usermodel.WorkbookFactory.wp(WorkbookFactory.java:334) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:224) at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185)
It seems, class loader is not loading the providers?
Any tips?
Vinod
This might be fixed by https://github.com/crealytics/spark-excel/pull/513. Can you merge that into your local branch and give it a try? Unfortunately Github Actions are failing because of some GPG error and I haven't had time to look into it...
While all the other versions were failing, I was able to install "com.crealytics:spark-excel_2.13:3.2.0_0.16.1-pre1"
I get the following error when I try to read an excel file: ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider com.crealytics.spark.v2.excel.ExcelDataSource could not be instantiated
Any ideas?
@ymopurpg yes, that's probably what was fixed in #513. Unfortunately the build is currently broken, and I don't have time to look into that...
I experienced the same issue as @grajee-everest mentioned. The following packages version helped me:
spark-submit ... --packages com.crealytics:spark-excel_2.12:0.14.0,commons-io:commons-io:2.11.0 ...
I gave it another try in 0.16.5-pre1. Can someone try that out?
@nightscape - I really appreciate it if you please provide the full version (com.crealytics:spark-excel_2.12:0.16.5-pre1 or something else?). I tried so many versions in the past few days and don't know which one to use.
Tried v0.16.5-pre1 this morning and getting the following error...
Py4JJavaError: An error occurred while calling o228.load.
: java.lang.NoClassDefFoundError: Could not initialize class shadeio.poi.util.IOUtils
at shadeio.poi.poifs.filesystem.FileMagic.valueOf(FileMagic.java:209)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:222)
at shadeio.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185)
at com.crealytics.spark.v2.excel.ExcelHelper.getWorkbook(ExcelHelper.scala:111)
at com.crealytics.spark.v2.excel.ExcelHelper.getRows(ExcelHelper.scala:127)
at com.crealytics.spark.v2.excel.ExcelTable.infer(ExcelTable.scala:72)
at com.crealytics.spark.v2.excel.ExcelTable.inferSchema(ExcelTable.scala:43)
at org.apache.spark.sql.execution.datasources.v2.FileTable.$anonfun$dataSchema$4(FileTable.scala:69)
at scala.Option.orElse(Option.scala:447)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema$lzycompute(FileTable.scala:69)
at org.apache.spark.sql.execution.datasources.v2.FileTable.dataSchema(FileTable.scala:63)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema$lzycompute(FileTable.scala:82)
at org.apache.spark.sql.execution.datasources.v2.FileTable.schema(FileTable.scala:80)
at com.crealytics.spark.v2.excel.ExcelDataSource.inferSchema(ExcelDataSource.scala:85)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:296)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
I gave it another try in 0.16.5-pre1. Can someone try that out?
I had the same issue described above but upgrading to com.crealytics:spark-excel_2.12:3.1.2_0.16.5-pre1 resolved the issue for me and it's now working for me. I have no other custom libraries installed on the databricks cluster other than com.crealytics:spark-excel_2.12:3.1.2_0.16.5-pre1
Can everybody try 0.16.5-pre2 and report back here please?
I just tried com.crealytics:spark-excel_2.12:3.2.1_0.16.5-pre2
on 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)
and get the following error:
val existing = spark.read.format("excel").option("header", "true").load("example.xlsx")
display(existing)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.75.73.234 executor 0): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V
at com.crealytics.spark.v2.excel.ExcelParser$.parseIterator(ExcelParser.scala:423)
at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.readFile(ExcelPartitionReaderFactory.scala:75)
at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.buildReader(ExcelPartitionReaderFactory.scala:61)
at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createReader$1(FilePartitionReaderFactory.scala:30)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:99)
at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:43)
at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:94)
at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:131)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1644)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2984)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2931)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2925)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2925)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1345)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1345)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1345)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3193)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3134)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3122)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1107)
at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2561)
at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:266)
at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:276)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:81)
at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:87)
at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:75)
at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:62)
at org.apache.spark.sql.execution.ResultCacheManager.collectResult$1(ResultCacheManager.scala:587)
at org.apache.spark.sql.execution.ResultCacheManager.computeResult(ResultCacheManager.scala:596)
at org.apache.spark.sql.execution.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:542)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:541)
at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:438)
at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:417)
at org.apache.spark.sql.execution.SparkPlan.executeCollectResult(SparkPlan.scala:422)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3132)
at org.apache.spark.sql.Dataset.$anonfun$collectResult$1(Dataset.scala:3123)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3930)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$6(SQLExecution.scala:195)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:342)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:153)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:852)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:115)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:292)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3928)
at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3122)
at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:268)
at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:102)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$getResultBufferInternal$3(ScalaDriverLocal.scala:345)
at scala.Option.map(Option.scala:230)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$getResultBufferInternal$1(ScalaDriverLocal.scala:325)
at scala.Option.map(Option.scala:230)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.getResultBufferInternal(ScalaDriverLocal.scala:289)
at com.databricks.backend.daemon.driver.DriverLocal.getResultBuffer(DriverLocal.scala:712)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:267)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$11(DriverLocal.scala:602)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$2(UsageLogging.scala:232)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:230)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:212)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:60)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:261)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:60)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:579)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:615)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:607)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:526)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:561)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:431)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:374)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:225)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.util.FailureSafeParser.<init>(Lscala/Function1;Lorg/apache/spark/sql/catalyst/util/ParseMode;Lorg/apache/spark/sql/types/StructType;Ljava/lang/String;)V
at com.crealytics.spark.v2.excel.ExcelParser$.parseIterator(ExcelParser.scala:423)
at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.readFile(ExcelPartitionReaderFactory.scala:75)
at com.crealytics.spark.v2.excel.ExcelPartitionReaderFactory.buildReader(ExcelPartitionReaderFactory.scala:61)
at org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.$anonfun$createReader$1(FilePartitionReaderFactory.scala:30)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.getNextReader(FilePartitionReader.scala:99)
at org.apache.spark.sql.execution.datasources.v2.FilePartitionReader.next(FilePartitionReader.scala:43)
at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:94)
at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:131)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:155)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:75)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:55)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:156)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:125)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:95)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:825)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1644)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:828)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:683)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
@alexjbush strange, that looks like the Spark version doesn't match the one from spark-excel, but from your description it clearly should...
Maybe there is sth. wrong with our cross-publishing build.sbt
.
Add the spark-excel as maven dependency instead of jar, this resolved my issue.
I'm tried to use spark-excel in Azure Databricks but I seem to be be running into an error. I earlier tried the same using SQLServer Big Data Cluster but I was unable to.
Current Behavior I'm getting an error java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
I loaded first the Maven Coordinates and got the error. I later followed the link and loaded the jar files and yet got the same error as shown in the screenshot.
Steps to Reproduce (for bugs)
Your Environment
Azure Databricks