mjul / docjure

Read and write Office documents from Clojure
MIT License
622 stars 129 forks source link

Incorrect value retrieving cell value when COUNTIFS is used #60

Open jonneale opened 7 years ago

jonneale commented 7 years ago

Hi Martin

We have a cell containing a formula which returns a weird result whenread-cell is called on it. I suspect this is a known POI issue, although I couldn't find any mention of it. We can work around it by using the underlying getNumericCellValue which I appreciate is probably not a great catch-all solution particularly for non-numeric formulae, but wonder whether you happen to know what's going on and whether you could point towards a fix.

We have seen this issue specifically in cases when the COUNTIFS function is being called with multiple predicates.

Given the sheet below: workbook.xlsx

Which for reference, looks like this:

screen shot 2017-01-05 at 16 34 29

I would expect the below code snippet to return the numerical value in cell B2, which is 0:

(ns docjure-test.core
  (:require [dk.ative.docjure.spreadsheet :as docjure]))

(defn read-file [path]
  (let [sheet (->> path 
                   (docjure/load-workbook)
                   (docjure/sheet-seq)
                   first)
        cell  (docjure/select-cell "B2" sheet)]
    (docjure/read-cell cell)))

Instead, when I evaluate this code. I get 2.0

If we replace the call to read-cell with the raw getNumericCellValue, as below:

(defn read-file [path]
  (let [sheet (->> path 
                   (docjure/load-workbook)
                   (docjure/sheet-seq)
                   first)
        cell  (docjure/select-cell "B2" sheet)]
    (. cell getNumericCellValue)))

The value 0.0 is returned as expected.

If there's no simple way to fix this issue, is there any way to throw an exception in this case? It would be better for us to retrieve no value instead of the wrong value.

Thanks

jonneale commented 7 years ago

Now I look a bit harder, it could well be an issue resulting from this bug, which does indeed point to a problem with POI's implementation of COUNTIFS, which is presumably why the formula evaluator used by read-cell is returning the wrong answer. It's weird that getNumericCellValue returns the correct value, though.

mjul commented 7 years ago

I agree that it sounds like a POI problem. Maybe the values of the formula are persisted with the sheet so getNumericCellValue works as long as the sheet has not been changed?

manuelherzog commented 5 years ago

it indeed is persisted. The Spreadsheets contains the latest formula results, so you don't need to calculate the formula if you just want to display it to the user. I noticed it while implementing the SXSSF Interface for stream-reading the Spreadsheets, because my Excelfiles i have to import would need more RAM then even a good server has.

I think the bug can be closed as it is resolved in POI some time now