Closed dinesh1512 closed 7 months ago
Please check these potential duplicates:
Please check these potential duplicates:
Please always try the newest version before creating issues. Closing this until the issue is reproduced with the newest version.
Is there an existing issue for this?
Current Behavior
Current Behavior When running v2 excel pySpark code below in Databricks 13.3 LTS Runtime:
df = spark.read.format("excel") .option("header", True) .option("inferSchema", True) .load(fr"{folderpath}//.xlsx") display(df)
I receive the following error upon attempting to display or use the resulting dataframe:
AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions;
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.178.42.202 executor 0): java.lang.AbstractMethodError: org.apache.spark.sql.execution.datasources.v2.FilePartitionReaderFactory.options()Lorg/apache/spark/sql/catalyst/FileSourceOptions;
This issue is same as https://github.com/crealytics/spark-excel/issues/682 that was addressed for older versions.
Expected Behavior
The resulting dataframe should display the data.
Steps To Reproduce
set the folderpath variable to a location containing excel files, and run the below python code in latest runtime of Databricks:
df = spark.read.format("excel") .option("header", True) .option("inferSchema", True) .load(fr"{folderpath}//.xlsx") display(df)
Environment
Anything else?
No response