techascent / tech.ml.dataset

A Clojure high performance data processing system
Eclipse Public License 1.0
680 stars 35 forks source link

.xlsx loading fails if a column has a number for a name #408

Closed kirahowe closed 6 months ago

kirahowe commented 6 months ago

Loading an .xlsx spreadsheet fails if a column has a number for a name. For example this spreadsheed throws Don't know how to create ISeq from: java.lang.Double: example_XLSX.xlsx

harold commented 6 months ago

Reproduced by downloading the file and...

user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tech.v3.libs.fastexcel])
Reflection warning, tech/v3/libs/fastexcel.clj:111:22 - reference to field getStableId on org.dhatim.fastexcel.reader.Sheet can't be resolved.
nil
user> (ds/->dataset "/home/harold/Downloads/example_XLSX.xlsx")
Execution error (IllegalArgumentException) at tech.v3.dataset.io.context$options$reify__31940/apply (context.clj:93).
Don't know how to create ISeq from: java.lang.Double

Suspicious: https://github.com/techascent/tech.ml.dataset/blob/b0896cc6116ad6aa049fb7f1b955e9fe49b07ae8/src/tech/v3/dataset/io/context.clj#L93

Similar:

user> (empty? "foo")
false
user> (empty? 0)
Execution error (IllegalArgumentException) at user/eval59523 (form-init3289276633768178084.clj:26).
Don't know how to create ISeq from: java.lang.Long
user> (empty? 0.0)
Execution error (IllegalArgumentException) at user/eval59525 (form-init3289276633768178084.clj:29).
Don't know how to create ISeq from: java.lang.Double

Maybe if it's a number, call str on it? Hard to know what the cleanest fix might be. hth

kirahowe commented 6 months ago

I assume you noticed already, but this is also the case for .xls files, e.g. example_XLS.xls