nightscape / spark-excel

A Spark plugin for reading and writing Excel files
Apache License 2.0
469 stars 147 forks source link

Excel formatted Date come out as 2 year digit even when the Excel date format is 4 year digit #879

Open tamilselvanyes opened 4 months ago

tamilselvanyes commented 4 months ago

Am I using the newest version of the library?

Is there an existing issue for this?

Current Behavior

When reading excel which contains the dates in the format MM/DD/YYYY, after reading using the below

data_frame = ( spark.read.format("excel") .option( "header", "true", ) .option("maxByteArraySize", 2147483647) .option("timestampFormat", "yyyy-MM-dd HH:mm:ss") .option("setErrorCellsToFallbackValues", "true") .option("maxRowsInMemory", 200) .load('ExcelReaderProblemExcel') )

image

Data frame result: [Row(Date MM/DD/YYYY='3/29/20'), Row(Date MM/DD/YYYY='3/14/21'), Row(Date MM/DD/YYYY='3/15/12'), Row(Date MM/DD/YYYY='3/16/00'), Row(Date MM/DD/YYYY='3/29/04'), Row(Date MM/DD/YYYY='3/29/04'), ] [Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), ]

Since there are date from 2100, it could be correct if I directly use the above dates. ExcelReaderProblemExcel.xlsx

Expected Behavior

The date string should come out same as shown in Excel. But the date string came out with only 2 digit year instead of 4 digit year value.

Steps To Reproduce

No response

Environment

- Spark version: 3.5.0
- Spark-Excel version: spark-excel_2.12-3.5.0_0.20.3
- OS: Windows
- Cluster environment: -

Anything else?

There is a similar issue related to this issue, which is still open

https://github.com/crealytics/spark-excel/issues/351