nightscape / spark-excel

A Spark plugin for reading and writing Excel files
Apache License 2.0
469 stars 147 forks source link

[BUG] Cannot read files into dataframe in Databricks 13.3 LTS Runtime 3.3.0 Spark #853

Closed dinesh1512 closed 7 months ago

dinesh1512 commented 7 months ago

Is there an existing issue for this?

Current Behavior

Current Behavior When running v2 excel pySpark code below in Databricks 13.3 LTS Runtime:

df = spark.read.format("excel") .option("header", True) .option("inferSchema", True) .load(fr"{folderpath}//.xlsx") display(df)

I receive the following error upon attempting to display or use the resulting dataframe:

AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions;

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.178.42.202 executor 0): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions;

This issue is same as https://github.com/crealytics/spark-excel/issues/682 that was addressed for older versions.

Expected Behavior

The resulting dataframe should display the data.

Steps To Reproduce

set the folderpath variable to a location containing excel files, and run the below python code in latest runtime of Databricks:

df = spark.read.format("excel") .option("header", True) .option("inferSchema", True) .load(fr"{folderpath}//.xlsx") display(df)

Environment

- Spark version: 3.4.1
- Spark-Excel version: 0.18.7
- OS: N/A
- Cluster environment

Anything else?

No response

github-actions[bot] commented 7 months ago

Please check these potential duplicates:

github-actions[bot] commented 7 months ago

Please check these potential duplicates:

nightscape commented 7 months ago

Please always try the newest version before creating issues. Closing this until the issue is reproduced with the newest version.