mjul / docjure

Read and write Office documents from Clojure
MIT License
622 stars 129 forks source link

Automatically use header row to read tabular sheet #84

Open dwaldhalm opened 5 years ago

dwaldhalm commented 5 years ago

I want to parse all the data from a tabular worksheet (one that has a header row) without having to type the column map for select-columns--like this:

(->> (load-workbook "file.xls")
     (select-sheet "Sheet1")
     (read-tabular-sheet))

...and get a list of maps like this:

({:arbitrary-col-name "a1" :another-header "b1"}
 {:arbitrary-col-name "a2" :another-header "b2"})

Please consider including the following read-tabular-sheet convenience function (or something like it) in docjure. I'll be happy to make a pull request if the idea suits you.

(ns read-tabular-sheet-proposal
  (:require [dk.ative.docjure.spreadsheet :refer :all]
            [camel-snake-kebab.core :as csk]))

(defn excel-col
  "Calculates Excel name for the given column index: 1 -> A, 26 -> Z, 27 -> AA"
  [i]
  (loop [d i
         c ""]
    (let [m (rem (- d 1) 26)]
      (if (> d 0)
        (recur (int(/ (- d m) 26))
               (str (char (+ 65 m)) c))
        c))))

(defn ->keys
  "Idiomatic keys from text in header row mapped to Excel col names."
  [names]
  (map #(hash-map (keyword (excel-col %1))
                  (keyword %2) )
       (take (count names) (rest (range)))
       (map csk/->kebab-case names)))

(defn col-headers
  "List of strings from the header row"
  [^org.apache.poi.hssf.usermodel.HSSFSheet ws]
  (->> (row-seq ws)
       first
       (.iterator)
       (iterator-seq)
       (map #(.toString %))))

(defn col-keys
  "Generate the key map for 'select-columns' based on the header row."
  [^org.apache.poi.hssf.usermodel.HSSFSheet ws]
  (->> (col-headers ws)
       (->keys)
       (apply merge)))

(defn read-tabular-sheet
  "Returns a Clojure map with  keys based on first row."
  [^org.apache.poi.hssf.usermodel.HSSFSheet ws]
  (rest (select-columns (col-keys ws) ws)))
kimim commented 3 years ago

I also want this feature, so I created a PR here: #93