mjul / docjure

Read and write Office documents from Clojure
MIT License
622 stars 129 forks source link

read-cell date formulas as numeric #24

Closed nidu closed 9 years ago

nidu commented 9 years ago

Hello. When you issue a (read-cell cell) for cell which contains formula that returns date - read-cell returns a number but not a date. Probably a check should be added in read-cell for CELL_TYPE_FORMULA.

mjul commented 9 years ago

Thanks, it would be very helpful if you could send a pull-request with the test cases and better yet also implementation of the proposed changes.

nidu commented 9 years ago

Could you please tell me how do you save .xlsx files? I just opened datatypes.xlsx, saved it and closed. File size is now 10KB (5 times smaller) and date tests are failing (probably because of 1900-1904 date window)

lein test dk.ative.docjure.spreadsheet-test

lein test :only dk.ative.docjure.spreadsheet-test/read-cell-value-test

FAIL in (read-cell-value-test) (spreadsheet_test.clj:103)
Should read all cell types
expected: (= date (read-cell-value (.. workbook getCreationHelper createFormulaEvaluator (evaluate date-cell)) true))
  actual: (not (= #inst "3910-07-31T18:00:00.000-00:00" #inst "3914-08-01T18:00:00.000-00:00"))

lein test dk.ative.docjure.xls-test

lein test :only dk.ative.docjure.xls-test/read-cell-value-test

FAIL in (read-cell-value-test) (xls_test.clj:86)
Should read all cell types
expected: (= date (read-cell-value (.. workbook getCreationHelper createFormulaEvaluator (evaluate date-cell)) true))
  actual: (not (= #inst "3910-07-31T18:00:00.000-00:00" #inst "3914-08-01T18:00:00.000-00:00"))
mjul commented 9 years ago

As far as I remember, I used Excel for Mac. Maybe you have uncovered a date-handling bug.

Excel has two different date systems, based on the years 1900 and 1904, so off-by-four is possible. It would be interesting to have spreadsheets based on both of these and check that we can correctly read dates from them.

You can find information on Excel's 1900-based and 1904-based date systems here: http://support2.microsoft.com/kb/180162

nidu commented 9 years ago

I don't know why, but i can't reproduce test fail. Probably it was because Win and Mac have different date base by default and my Win Excel changed it on saving, however i can't reproduce it now (btw size difference is caused by throwing away thumbnail).

So it seems DateUtil/getJavaDate(double) uses 1900 system and doesn't take into account workbook's date base itself. Moreover XSSFWorkbook.isDate1904 method is protected. So i can inherit and open this method.

However i'm trying now different options and simple Cell.getDateCellValue returns proper value even for formula fields. Looking at XSSFCell.getDateCellValue implementation we can see that it calls .getNumericCellValue which itself handles formula cells. So don't you know if we could just read date formula with .getDateCellValue or there are any drawbacks?

mjul commented 9 years ago

First, thanks a lot for your work on improving the library.

I guess you could create a function that uses DateUtil/getJavaDate(double) and XSSFWorkbook.isDate1904 in combination (function call composition) rather than inheritance, possibly currying its workbook argument to create a single-arg date-getting function.

Regarding .getDateCellValue it sounds like a good choice, assuming that we don't accidentally convert non-date cells to dates.

As I am not up to date on the POI implementation details,I advice you to use your best judgement - I am sure you will make a good decision.