mjul / docjure

Read and write Office documents from Clojure
MIT License
622 stars 129 forks source link

Reading Cell Values #32

Closed oliverholworthy closed 9 years ago

oliverholworthy commented 9 years ago

Hi Martin,

Before I submit a pull request I'd like to get your thoughts on some changes so that the pull request can be merged easily.

There are two changes I'd like to propose both to fix behaviour around reading cells. From error cells and formula cells. The changes should allow you close Pull Requests #7 #8 #11 #31 and issues #21 #27.

Reading Cell Values

Error Cells

Pull requests #7, #8, and issues #21, #27 indicate that reading error cells has been a problem for some people (myself now included) in the form of an unhandled dispatch value in the multimethod read-cell.

Unhandled java.lang.IllegalArgumentException
No method in multimethod 'read-cell' for dispatch value: 5

Dispatch value 5 corresponds to the cell value CELL_TYPE_ERROR, the only value currently unhandled by the multimethod of the six cell types:

CELL_TYPE_BLANK
CELL_TYPE_BOOLEAN
CELL_TYPE_ERROR
CELL_TYPE_FORMULA
CELL_TYPE_NUMERIC
CELL_TYPE_STRING

I think it would be desirable to be able to call read-cell on any valid cell type without throwing an exception.

I would like to propose that the keyword value of the FormulaError Enumeration is returned:

i.e. one of the following:

:VALUE :DIV0 :CIRCULAR_REF :REF :NUM :NULL :FUNCTION_NOT_IMPLEMENTED :NAME :NA

Reading the value of an error cell returns a byte which can be used to find the corresponding formula error.

(defmethod read-cell Cell/CELL_TYPE_ERROR [^Cell cell]
  (keyword (.name (FormulaError/forInt (.getErrorCellValue cell)))))

(keyword (.name FormulaError/NA))
=> :NA

Formula Cells

Currently calling read-cell on a formula cell that evaluates to anything other than a numeric value will raise an exception.

Pull request #11 attempts to solve this by wrapping the body in a try-catch. Which helps with the exception, but that doesn't address the actual problem of not being able to read non-numeric formula cells. (e.g. text or error formula cells) without resulting in an IllegalStateException:

IllegalStateException Cannot get a numeric value from a error formula cell
IllegalStateException Cannot get a numeric value from a text formula cell

The DateUtil/isCellDateFormatted function is only valid for numeric type cells. So need to check this after evaluation of the formula when we know the type of cell value.

i.e. the following:

(defmethod read-cell Cell/CELL_TYPE_FORMULA   [^Cell cell]
  (let [evaluator (.. cell getSheet getWorkbook
                      getCreationHelper createFormulaEvaluator)
        cv (.evaluate evaluator cell)]
    (if (and (= Cell/CELL_TYPE_NUMERIC (.getCellType cv))
             (DateUtil/isCellDateFormatted cell))
      (.getDateCellValue cell)
      (read-cell-value cv false))))

This turns out to be the exact change someone has proposed with #31. However, that commit is littered with trailing whitespace changes in addition to the code change.

Whitespace

Pull request: #15 #16 #31 all include trailing whitespace changes that clutter the commits.

I'd like to propose a single commit to remove all existing trailing whitespace to improve clarity of future commits from contributors that forget to turn off their save/commit hooks that remove the whitespace.

— Oliver

mjul commented 9 years ago

This is an excellent proposal, thanks for taking the time.

Historically, I did not originally include a CELL_TYPE_ERROR overload for the multi-method, since there is no obvious error handling strategy that would fit all users. However, as a lot of people want it I would happily accept your contribution.

For extra points, it would be extra nice if you could add a section to the README showing how to change the error handling strategy for uses who prefer something other than keywords.

rakhra commented 9 years ago

+1

I agree that there should be a single commit that removes all existing trailing whitespace. Many Clojure projects and this popular Clojure style guide - https://github.com/bbatsov/clojure-style-guide#no-trailing-whitespace - suggests it is best practice to omit trailing whitespace.

This is a well formulated proposal and look forward to seeing many of the pull requests closed.

mjul commented 9 years ago

Merged. Thank you for the contribution!