nightscape / spark-excel

A Spark plugin for reading and writing Excel files
Apache License 2.0
464 stars 146 forks source link

[BUG] Cannot read excel files using the V2 API #896

Open massazan opened 4 days ago

massazan commented 4 days ago

Am I using the newest version of the library?

Is there an existing issue for this?

Current Behavior

When using the V2 API. When using the version 0.20.4, the following error occurs: ClassCastException: scala.Some cannot be cast to [Lorg.apache.spark.sql.catalyst.InternalRow; Error occurs when you omit the end boundary cell on the DataAddress parameter i.e "'0'!A5"

Error is occurs for Scala and PySpark

Expected Behavior

Spark DataReader should return a DataFrame with no errors

Steps To Reproduce

Error occurs when you omit the end boundary cell on the DataAddress parameter i.e "'0'!A5"

val configs = Map( "inferSchema" -> "false", "dataAddress" -> "'0'!A5", "header" -> "false" )

// Ensure you're using the spark-excel package val df = spark.read.format("excel") .option("header", configs("header")) .option("inferSchema", configs("inferSchema")) .option("dataAddress", configs("dataAddress")) .load(s3_path)

df.show()

Environment

- Spark version: DataBricks Runtime version: 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
- Spark-Excel version:com.crealytics:spark-excel_2.12:3.4.1_0.20.4
- OS:
- Cluster environment

Anything else?

API V1 works fine.

github-actions[bot] commented 4 days ago

Please check these potential duplicates:

nightscape commented 3 days ago

@massazan looks like this one: https://github.com/crealytics/spark-excel/issues/808

massazan commented 2 days ago

Hi @nightscape, yes it is the same issue. I tried to install the artifact 3.4.2 as mentioned, but I still got problems with the DataBricks Runtime 13.3. I tried on the Runtime 14.3 LTS and it works. Is there any plans to solve the problem fro the Runtime 13.3?

Thanks