Open tamilselvanyes opened 4 months ago
When reading excel which contains the dates in the format MM/DD/YYYY, after reading using the below
data_frame = ( spark.read.format("excel") .option( "header", "true", ) .option("maxByteArraySize", 2147483647) .option("timestampFormat", "yyyy-MM-dd HH:mm:ss") .option("setErrorCellsToFallbackValues", "true") .option("maxRowsInMemory", 200) .load('ExcelReaderProblemExcel') )
Data frame result: [Row(Date MM/DD/YYYY='3/29/20'), Row(Date MM/DD/YYYY='3/14/21'), Row(Date MM/DD/YYYY='3/15/12'), Row(Date MM/DD/YYYY='3/16/00'), Row(Date MM/DD/YYYY='3/29/04'), Row(Date MM/DD/YYYY='3/29/04'), ] [Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), ]
Since there are date from 2100, it could be correct if I directly use the above dates. ExcelReaderProblemExcel.xlsx
The date string should come out same as shown in Excel. But the date string came out with only 2 digit year instead of 4 digit year value.
No response
- Spark version: 3.5.0 - Spark-Excel version: spark-excel_2.12-3.5.0_0.20.3 - OS: Windows - Cluster environment: -
There is a similar issue related to this issue, which is still open
https://github.com/crealytics/spark-excel/issues/351
Am I using the newest version of the library?
Is there an existing issue for this?
Current Behavior
When reading excel which contains the dates in the format MM/DD/YYYY, after reading using the below
data_frame = ( spark.read.format("excel") .option( "header", "true", ) .option("maxByteArraySize", 2147483647) .option("timestampFormat", "yyyy-MM-dd HH:mm:ss") .option("setErrorCellsToFallbackValues", "true") .option("maxRowsInMemory", 200) .load('ExcelReaderProblemExcel') )
Data frame result: [Row(Date MM/DD/YYYY='3/29/20'), Row(Date MM/DD/YYYY='3/14/21'), Row(Date MM/DD/YYYY='3/15/12'), Row(Date MM/DD/YYYY='3/16/00'), Row(Date MM/DD/YYYY='3/29/04'), Row(Date MM/DD/YYYY='3/29/04'), ] [Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), Row(UTF-8 strings='Portégé'), ]
Since there are date from 2100, it could be correct if I directly use the above dates. ExcelReaderProblemExcel.xlsx
Expected Behavior
The date string should come out same as shown in Excel. But the date string came out with only 2 digit year instead of 4 digit year value.
Steps To Reproduce
No response
Environment
Anything else?
There is a similar issue related to this issue, which is still open
https://github.com/crealytics/spark-excel/issues/351