Closed jwooden1 closed 5 years ago
I had the same problem a few days ago, but haven't found a proper solution.
The problem is that Spark comes bundled with a rather outdated version of commons-compress
and POI needs a newer version. In principle it should be possible to override the JARs bundled with Spark with user-provided ones, but I haven't yet managed to successfully do so.
In case you find a solution, please post it here 👍
In the mean time, you could try older versions of spark-excel maybe the pre-0.10 versions work with the older version of commons-compress
.
I had the same issue (but not for spark-excel, another software). You need to shade the dependencies to commons-compress so that your Spark application uses the new version of commons-compress. You can do this in Java with the Maven shade plugin or in Scala with the assembly plugin (https://github.com/sbt/sbt-assembly) of SBT. Then, you can define in your build.sbt a rule to shade the commons compress (https://github.com/sbt/sbt-assembly#shading).
If you want to use R and Python then maybe @nightscape needs to shade it directly in the spark-excel module that is published on Maven.
The other way "override the Jars bundled with Spark" is in this case not possible, because it is core part of Spark. However, shading it is not so bad in this case. I recommend also to create a JIRA issue for this with the Spark project to update commons-compress (the old version is vulnerable to several attacks).
I just released 0.10.1
and 0.11.0-beta2
which shade commons-compress
and should hopefully fix this problem.
Can you give it a try and tell me if it worked?
Hi @nightscape I m using 0.11.0-beta2 and I still have the same Error When I use a dependency to commons-compress, I have this message : _
diagnostics: User class threw exception: java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.
When I dont use the dependency, I have this :
diagnostics: User class threw exception: java.lang.NoClassDefFoundError: org/apache/commons/compress/utils/InputStreamStatistics
_ As a reminder, I try to write the contents of several dataframes in several sheets of the same Excel file
@nightscape I think you don't include commons-compress explicitly in the resulting jar of the spark-excel module. In this case the shading rules will not apply. See fat jar: https://github.com/sbt/sbt-assembly.
Just trying another approach. Can someone check 0.11.0-beta3
?
@nightscape : it's OK :) Thanks
Ok, then I'll backport this to 0.10 and release 0.11 from the beta version.
Fixed in 0.10.2 and 0.11.0-beta3.
fix is working for 0.10.2, but not in 0.11.0-beta3. I get this error in 0.11.0-beta3.
scala.MatchError: Map(treatemptyvaluesasnulls -> false, path -> /unique.xlsx, useheader -> true, endcolumn -> 8, inferschema -> true, startcolumn -> 0, sheetname -> input) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap) at com.crealytics.spark.excel.DataLocator$.apply(DataLocator.scala:52) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:29) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156) ... 53 elided
Looking at the code, it looks to me it is due to making dataaddress a mandetory filed? what is it anyway? Also, I think it is creating a side-effect, because if I pass null when reading, there is no err in read, but it does not read the specified sheet-- looks that it just read the first sheet.
fix is working for 0.10.2, but not in 0.11.0-beta3. I get this error in 0.11.0-beta3.
scala.MatchError: Map(treatemptyvaluesasnulls -> false, path -> /unique.xlsx, useheader -> true, endcolumn -> 8, inferschema -> true, startcolumn -> 0, sheetname -> input) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap) at com.crealytics.spark.excel.DataLocator$.apply(DataLocator.scala:52) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:29) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:18) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:12) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156) ... 53 elided
Looking at the code, it looks to me it is due to making dataaddress a mandetory filed? what is it anyway? Also, I think it is creating a side-effect, because if I pass null when reading, there is no err in read, but it does not read the specified sheet-- looks that it just read the first sheet.
I am facing the same error in 0.11.0. Any update on this?
Exception in thread "main" scala.MatchError: Map(treatemptyvaluesasnulls -> true, location -> hdfs://nameservice1/flatfiles/raw/500a_map_e.xlsx, useheader -> true, inferschema -> true, addcolorcolumns -> false, sheetname -> _500a_map_e) (of class org.apache.spark.sql.catalyst.util.CaseInsensitiveMap)
I am facing above issue.
dependencies used .
</dependencies>
can anyone help?
solved the issue :
used --packages com.crealytics:spark-excel_2.11:0.10.2
worked fine
I can reproduce this locally now. The problem seems to be that despite shading org.apache.commons.compress
this line seems to be calling the constructor of the unshaded ZipArchiveInputStream
.
Trying to find out what's happening...
Not understanding it... The exception says the following:
java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.
org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:63)
org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipStream(ZipHelper.java:180)
org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:104)
org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:298)
org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:129)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.apache.poi.ss.usermodel.WorkbookFactory.createWorkbook(WorkbookFactory.java:314)
org.apache.poi.ss.usermodel.WorkbookFactory.createXSSFWorkbook(WorkbookFactory.java:296)
org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:214)
org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:180)
com.crealytics.spark.excel.DefaultWorkbookReader.$anonfun$openWorkbook$1(WorkbookReader.scala:42)
on the other hand, when I download and unzip the spark-excel JAR and run
javap -verbose com/crealytics/spark-excel_2.12/0.11.2/org/apache/poi/openxml4j/opc/internal/ZipHelper.class
it clearly shows that the above method is using the shaded classes:
public static org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream openZipStream(java.io.InputStream) throws java.io.IOException;
descriptor: (Ljava/io/InputStream;)Lorg/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream;
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=5, locals=2, args_size=1
0: aload_0
1: invokestatic #108 // Method org/apache/poi/poifs/filesystem/FileMagic.prepareToCheckMagic:(Ljava/io/InputStream;)Ljava/io/InputStream;
4: astore_1
5: aload_1
6: invokestatic #139 // Method verifyZipHeader:(Ljava/io/InputStream;)V
9: new #141 // class org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream
12: dup
13: new #143 // class shadeio/commons/compress/archivers/zip/ZipArchiveInputStream
16: dup
17: aload_1
18: invokespecial #145 // Method shadeio/commons/compress/archivers/zip/ZipArchiveInputStream."<init>":(Ljava/io/InputStream;)V
21: invokespecial #146 // Method org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream."<init>":(Ljava/io/InputStream;)V
24: areturn
Maybe some of your dependencies have POI as a dependency and then this dependency does not use the shaded commons-io
@jornfranke That was exactly the problem. spark-excel
itself still adds POI as a dependency (see https://github.com/hammerlab/sbt-parent/issues/32).
I'm now bundling and shading all dependencies that require commons-io
.
I just released 0.12.0
with this fix (and Scala 2.12 compatibility), it should appear on Maven Central in the next few hours.
Please go ahead and try it.
I'll close this issue until there are reports of the problem occurring again.
Confirmed 0.12.0 working in AWS Glue now - thanks for the quick response!
@jlscott3 hi, do u mind to share how do u get this to work in glue? do u just add the spark-excel_2.12-0.12.0.jar to Jar lib path in the glue job? do u need to set anything else? I tried spark-excel_2.12-0.12.0.jar, spark-excel_2.11-0.12.0.jar, spark-excel_2.11-0.11.1.jar but all throw error... thanks in advance.
Update:
Finally I got it working in AWS glue.
Below are the jars I used: ooxml-schemas-1.4.jar poi-4.0.0.jar spark-excel_2.11-0.12.0.jar xmlbeans-3.1.0.jar
Hope it helps.
It turns out something went wrong while publishing spark-excel_2.12-0.12.0.jar
, so that version actually still had this problem.
In case anyone wants to try with Scala 2.12 it should work with spark-excel 0.12.1
.
@jlscott3 hi, do u mind to share how do u get this to work in glue? do u just add the spark-excel_2.12-0.12.0.jar to Jar lib path in the glue job? do u need to set anything else? I tried spark-excel_2.12-0.12.0.jar, spark-excel_2.11-0.12.0.jar, spark-excel_2.11-0.11.1.jar but all throw error... thanks in advance.
Update:
Finally I got it working in AWS glue.
Below are the jars I used: ooxml-schemas-1.4.jar poi-4.0.0.jar spark-excel_2.11-0.12.0.jar xmlbeans-3.1.0.jar
Hope it helps.
Do we need to import in spark code.. Can you please provide some sample code?
Did anyone get the solution to this problem. I am facing the same problem with the latest version of spark-excel -> 0.13.5
scala> val file = new File("/Users/vinodsharma/Documents/Spark-Excel/People.xlsx") file: java.io.File = /Users/vinodsharma/Documents/Spark-Excel/People.xlsx
scala> val fIP = new FileInputStream(file) fIP: java.io.FileInputStream = java.io.FileInputStream@236ec69
scala> val wb = new XSSFWorkbook(fIP)
java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.
at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.
How to go about changing the classpath for common compress jar: In my case, the version of compress jar is org.apache.commons#commons-compress;1.20
You might have to manually exclude commons-compress from the dependencies due to this problem which I don't yet know how to fix: https://github.com/hammerlab/sbt-parent/issues/32
@nightscape : In my case, I tried all the versions from 0.12.1 to 0.13.5, none worked. Downloaded the latest version of common compress manually which spark-shell showed as if it has downloaded while launching the spark shell with packages option but actually did not(as I could not find anywhere in the maven repo dir where it said, it’s downloaded) Version: 1.20 Then explicitly mentioned the jar name in the driver’s classpath as mentioned below: $ spark-shell --driver-class-path /home/xvinosh/.m2/repository/org/apache/commons/commons-compress/1.20/commons-compress-1.20jar
This worked. Hope it helps other.
@nightscape hi I tried the 0.9.0 version with spark 2.3.1 (local and cluster mode). It is worked but when I use a large excel file, a spark cannot process it.
Then tried higher versions of your library from 0.10:
Exception in thread "main" java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipFile$1 is not implementing InputStreamStatistics. at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:63) at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) at org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) at org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:258) at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) at etl.io.XlsxReader.open(XlsxReader.scala:135) at etl.io.XlsxReader.<init>(XlsxReader.scala:153) at etl.connectors.excel.ExcelConnector.readXlsx(ExcelConnector.scala:194) at etl.connectors.excel.ExcelConnector.read(ExcelConnector.scala:119) at etl.io.DatasetReader$.read(DatasetReader.scala:47) at etl.DatasetResolver$.byModel(DatasetResolver.scala:58) at etl.App$.processTask(App.scala:105) at etl.App$.main(App.scala:65) at etl.App.main(App.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
@sjahongir can you try the recommendation from @xvinosh?
@nightscape I still see issues with spark excel compatible with 2.12..
Using 0.13.4 I face java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics.
at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.
Using 0.12.0 or 0.12.1 I get useHeader errors and as well as the above. Nothing is working out. Tried using commons-compress-1.20.jar along with other jars in my spark submit. No use.
Currently we are migrating to scala 2.12, could you pls suggest the spark excel version for the same without these issues?
Hi @SwapnaRavi21, I would recommend always using the latest version available for your Spark & Scala version. @quanghgx and me will try to figure out a way to build against multiple versions of Spark. Unfortunately I'm under quite some deadline pressure at the moment and will probably only get to this the second week of November. If you have experience with SBT, we'd be happy for any contributions!
@nightscape yes we are onto latest scala only 2.12. But this fix is available only in 2.11 and not in 2.12 right. Sure thanks. Meanwhile is there any alternative for this dependency so we can use that in 2.12 until the fix is provided in this version.
currently seeing this behavior in Databricks in multiple runtime versions (14.3LTS, 15.4LTS) ; scala 2.12 spark 3.5.0
version : com.crealytics:spark-excel_2.12:3.5.0_0.20.3
Caused by: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.zip.ZipArchiveOutputStream.putArchiveEntry(Lorg/apache/commons/compress/archivers/zip/ZipArchiveEntry;)V
at org.apache.poi.openxml4j.opc.internal.ZipContentTypeManager.saveImpl(ZipContentTypeManager.java:65)
at org.apache.poi.openxml4j.opc.internal.ContentTypeManager.save(ContentTypeManager.java:450)
at org.apache.poi.openxml4j.opc.ZipPackage.saveImpl(ZipPackage.java:608)
at org.apache.poi.openxml4j.opc.OPCPackage.save(OPCPackage.java:1532)
at org.apache.poi.ooxml.POIXMLDocument.write(POIXMLDocument.java:227)
at com.crealytics.spark.excel.v2.ExcelGenerator.close(ExcelGenerator.scala:177)
at com.crealytics.spark.excel.v2.ExcelOutputWriter.close(ExcelOutputWriter.scala:34)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseCurrentWriter(FileFormatDataWriter.scala:71)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:82)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.$anonfun$commit$2(FileFormatDataWriter.scala:141)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.enrichWriteError(FileFormatDataWriter.scala:97)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:140)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:560)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1560)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:566)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:125)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:938)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:938)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.$anonfun$computeOrReadCheckpoint$1(RDD.scala:413)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:377)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:211)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:199)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:161)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:109)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:108)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:155)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:102)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:1036)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1039)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:926)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
excluding org.apache.commons:commons-compress
building our spark Jar application did not help. Also adding an explicit dependency for commons-compress
did not help.
are there any recommendations for workarounds?
@neontty looks like Spark defaults to an out of date CVE ridden version of commons-compress.
https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.5.3
POI uses a newer version of commons-compress and must rely on methods from that were added or changed recently.
Can you try to upgrade the commons-compress jar that Spark uses? Maybe best to ask on Spark mailing lists or forums if you don't know how to do this.
I can use the library when I run spark on my local windows machine and read excel files on the same machine. However, when I upload the files to WASB on Azure and use HDInsight cluster for running spark jobs (either local or cluster mode), I get the following error:
java.lang.IllegalArgumentException: InputStream of class class org.apache.commons.compress.archivers.zip.ZipArchiveInputStream is not implementing InputStreamStatistics. at org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:63) at org.apache.poi.openxml4j.opc.internal.ZipHelper.openZipStream(ZipHelper.java:180) at org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:104) at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:298) at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.createWorkbook(XSSFWorkbookFactory.java:129) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.poi.ss.usermodel.WorkbookFactory.createWorkbook(WorkbookFactory.java:314) at org.apache.poi.ss.usermodel.WorkbookFactory.createXSSFWorkbook(WorkbookFactory.java:296) at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:214) at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:180) at com.crealytics.spark.excel.ExcelRelation$$anonfun$openWorkbook$2$$anonfun$apply$4.apply(ExcelRelation.scala:66) at com.crealytics.spark.excel.ExcelRelation$$anonfun$openWorkbook$2$$anonfun$apply$4.apply(ExcelRelation.scala:66) at scala.Option.fold(Option.scala:158) at com.crealytics.spark.excel.ExcelRelation$$anonfun$openWorkbook$2.apply(ExcelRelation.scala:66) at com.crealytics.spark.excel.ExcelRelation$$anonfun$openWorkbook$2.apply(ExcelRelation.scala:66) at scala.Option.getOrElse(Option.scala:121) at com.crealytics.spark.excel.ExcelRelation.openWorkbook(ExcelRelation.scala:64) at com.crealytics.spark.excel.ExcelRelation.excerpt$lzycompute(ExcelRelation.scala:71) at com.crealytics.spark.excel.ExcelRelation.excerpt(ExcelRelation.scala:70) at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:264) at com.crealytics.spark.excel.ExcelRelation$$anonfun$inferSchema$1.apply(ExcelRelation.scala:263) at scala.Option.getOrElse(Option.scala:121) at com.crealytics.spark.excel.ExcelRelation.inferSchema(ExcelRelation.scala:263) at com.crealytics.spark.excel.ExcelRelation.<init>(ExcelRelation.scala:91) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:39) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:14) at com.crealytics.spark.excel.DefaultSource.createRelation(DefaultSource.scala:8) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:309) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:156) ... 53 elided