Closed alimulondo closed 1 year ago
Could you please specify which filetype you are using? xls
or xlsx
? Or even better provide a sample (failing test case with an data file).
If you are reading an xls
file you can use the datesAsIso
property to have dates returned as ISO-8501 strings instead of the internal representation (that way it is consistent). See https://github.com/spring-projects/spring-batch-extensions/tree/main/spring-batch-excel#configuration-properties. When using an xlsx
this is, sadly, not possible due to that information (field type, and formatting) not being readily available with Apache POI.
Hi Marten,
It is xlsx
format!
It's a real production app I may not be able to provide sample code.
However I understand the limitation you just stated. Can I force it to just read the cell as string and not convert it to date, I then convert that string to date format of my choice.
It returns string array so I wonder why it labors to convert a given cell to date.
Thanks
It doesn't convert to a date, it just shows the String
representation of the value and it should use the formatting as used/defined in the Excel document.
For the test can anonimize (or make-up) data and just have a test with a simple configuration or the reader.
I did some checking and I might be able to re-use the datesAsIso
(which will give you a fixed format) for the xlsx
case as well. I need to do some more investigation and testing for that.
Let me provide some sample data and code in a short while.
Here is the link to the demo : demo
It contains both the file and the working demo. Thanks.
FIrst of all thanks for the files and code. That is a starting point for me. Could you tell me one more thing, which Locale
are you using in Excel? I wonder if it would make sense to set it for parsing the files (currently it will use the system default).
Looking at the javadoc from DataFormatter
this might be more involved then I thought.
Some formats are automatically "localized" by Excel, eg show as mm/dd/yyyy when loaded in Excel in some Locales but as dd/mm/yyyy in others. These are always returned in the "default" (US) format, as stored in the file. Some format strings request an alternate locale, eg [$-809]d/m/yy h:mm AM/PM which explicitly requests UK locale. These locale directives are (currently) ignored. You can use DateFormatConverter to do some of this localisation if you need it.
Note the empasized text (mine). Apparently everything will use the format as defined, unless it is a date/time field it will use the US locale by default (which is basically how it is stored internally).
I need some time to ponder this on how to best solve this, my first gut feeling is to always convert date/time to ISO formatted values instead of what is in the file. That way it is at least consistent. Or to configure a Locale
and use the default date/time formatting for that (but that could lead to surprises when not explicitly setting it and using a system default).
Locale in excel is : English(Uganda) and Calendar type is : Gregorian.
I added Locale:Armenian in the second column of the sample data and it picks as expected! The issue should be with how the using a default locale in the lib. (I pushed it to the repo as well)
I'm not sure I understand your last comment? But what I get is that you explicitly set a locale/formatting on the date field and that that is now used (which would argue against what the javadoc states :) ). So apparently the specified locale/formatting is used at least as long as it is explicitly set, when using the default it will use the default for the platform language (at least that is what I suspect).
Did you get the updated sample data? There are two columns in the sheet. First column contains date in Ugandan locale which when extracted picks only the last digits of the year. The second column contains date in Armenian locale which when extracted from the sheet picks all the digits of the year as expected. So the point am trying to make is that some locales result to the right extracted data and others don't.
So can you give us the flexibility to set locale type expected before extraction. Am assuming it is using some default.
Well there are 2 things to consider. 1 is what is in the XLSX and the other is what Apache POI uses. But we can configure the DataFormatter
with a Locale
to use for formatting, so that is a possibility. I could also port that to be useful for both XLS and XLSX.
I'll have a go see what I can do, including updating the XLSX to allow for ISO date formatting.
Closing in favor of the #100 PR.
Thanks a lot! Let me check it out.
Where these changes omitted in the current version : `
<artifactId>spring-batch-core</artifactId>
<version>5.0.1</version>
`
These changes aren't part of Spring Batch but rather this specific Spring Batch extension. I haven't yet released a version as I was actually awaiting your confirmation. There currently isn't (yet) a version that has been tested against Spring Batch 5.x (that would probably become 0.2). Although I did some testing and the project appears to work just fine (but that was against a milestone release).
So there is no maven coordinate I can use to access these changes? How do I test them I my application then.
You can checkout the code and build the artifact yourself.
Sounds good. Let me do that.
sample code: