spring-projects / spring-batch-extensions

Spring Batch Extensions
238 stars 255 forks source link

When extracting data from the RowSet (rs) year in date cell is shortened #98

Closed alimulondo closed 1 year ago

alimulondo commented 1 year ago

sample code:

var result = rs.getCurrentRow();
var actualDate = result[5]; // returns "5/29/19" year has been shorted from "2019" to "19"
var expected = "5/29/2019";
assertEquals(expected, actualDate);
mdeinum commented 1 year ago

Could you please specify which filetype you are using? xls or xlsx? Or even better provide a sample (failing test case with an data file).

If you are reading an xls file you can use the datesAsIso property to have dates returned as ISO-8501 strings instead of the internal representation (that way it is consistent). See https://github.com/spring-projects/spring-batch-extensions/tree/main/spring-batch-excel#configuration-properties. When using an xlsx this is, sadly, not possible due to that information (field type, and formatting) not being readily available with Apache POI.

alimulondo commented 1 year ago

Hi Marten, It is xlsx format! It's a real production app I may not be able to provide sample code. However I understand the limitation you just stated. Can I force it to just read the cell as string and not convert it to date, I then convert that string to date format of my choice. It returns string array so I wonder why it labors to convert a given cell to date. Thanks

mdeinum commented 1 year ago

It doesn't convert to a date, it just shows the String representation of the value and it should use the formatting as used/defined in the Excel document.

For the test can anonimize (or make-up) data and just have a test with a simple configuration or the reader.

I did some checking and I might be able to re-use the datesAsIso (which will give you a fixed format) for the xlsx case as well. I need to do some more investigation and testing for that.

alimulondo commented 1 year ago

Let me provide some sample data and code in a short while.

alimulondo commented 1 year ago

Here is the link to the demo : demo

alimulondo commented 1 year ago

It contains both the file and the working demo. Thanks.

mdeinum commented 1 year ago

FIrst of all thanks for the files and code. That is a starting point for me. Could you tell me one more thing, which Locale are you using in Excel? I wonder if it would make sense to set it for parsing the files (currently it will use the system default).

Looking at the javadoc from DataFormatter this might be more involved then I thought.

Some formats are automatically "localized" by Excel, eg show as mm/dd/yyyy when loaded in Excel in some Locales but as dd/mm/yyyy in others. These are always returned in the "default" (US) format, as stored in the file. Some format strings request an alternate locale, eg [$-809]d/m/yy h:mm AM/PM which explicitly requests UK locale. These locale directives are (currently) ignored. You can use DateFormatConverter to do some of this localisation if you need it.

Note the empasized text (mine). Apparently everything will use the format as defined, unless it is a date/time field it will use the US locale by default (which is basically how it is stored internally).

I need some time to ponder this on how to best solve this, my first gut feeling is to always convert date/time to ISO formatted values instead of what is in the file. That way it is at least consistent. Or to configure a Locale and use the default date/time formatting for that (but that could lead to surprises when not explicitly setting it and using a system default).

alimulondo commented 1 year ago

Locale in excel is : English(Uganda) and Calendar type is : Gregorian.

alimulondo commented 1 year ago

I added Locale:Armenian in the second column of the sample data and it picks as expected! The issue should be with how the using a default locale in the lib. (I pushed it to the repo as well)

mdeinum commented 1 year ago

I'm not sure I understand your last comment? But what I get is that you explicitly set a locale/formatting on the date field and that that is now used (which would argue against what the javadoc states :) ). So apparently the specified locale/formatting is used at least as long as it is explicitly set, when using the default it will use the default for the platform language (at least that is what I suspect).

alimulondo commented 1 year ago

Did you get the updated sample data? There are two columns in the sheet. First column contains date in Ugandan locale which when extracted picks only the last digits of the year. The second column contains date in Armenian locale which when extracted from the sheet picks all the digits of the year as expected. So the point am trying to make is that some locales result to the right extracted data and others don't.

alimulondo commented 1 year ago

So can you give us the flexibility to set locale type expected before extraction. Am assuming it is using some default.

mdeinum commented 1 year ago

Well there are 2 things to consider. 1 is what is in the XLSX and the other is what Apache POI uses. But we can configure the DataFormatter with a Locale to use for formatting, so that is a possibility. I could also port that to be useful for both XLS and XLSX.

I'll have a go see what I can do, including updating the XLSX to allow for ISO date formatting.

mdeinum commented 1 year ago

Closing in favor of the #100 PR.

alimulondo commented 1 year ago

Thanks a lot! Let me check it out.

alimulondo commented 1 year ago

Where these changes omitted in the current version : `

org.springframework.batch
<artifactId>spring-batch-core</artifactId>
<version>5.0.1</version>

`

mdeinum commented 1 year ago

These changes aren't part of Spring Batch but rather this specific Spring Batch extension. I haven't yet released a version as I was actually awaiting your confirmation. There currently isn't (yet) a version that has been tested against Spring Batch 5.x (that would probably become 0.2). Although I did some testing and the project appears to work just fine (but that was against a milestone release).

alimulondo commented 1 year ago

So there is no maven coordinate I can use to access these changes? How do I test them I my application then.

mdeinum commented 1 year ago

You can checkout the code and build the artifact yourself.

alimulondo commented 1 year ago

Sounds good. Let me do that.