scicloj / tablecloth

Dataset manipulation library built on the top of tech.ml.dataset
https://scicloj.github.io/tablecloth
MIT License
305 stars 27 forks source link

document better how to read xlsx files (and support multisheet xlsx files) #80

Open behrica opened 2 years ago

behrica commented 2 years ago

Maype point here:

https://techascent.github.io/tech.ml.dataset/tech.v3.libs.fastexcel.html

But in any case "pure tablecloth" will then only read xlsx files with one sheet.

tech.ml.dataset supports multi sheet , in this way, reading first sheet for example.

(ns xxxx
  (:require [tablecloth.api :as tc]
            [tech.v3.libs.fastexcel]))

(->
 (tech.v3.libs.fastexcel/input->workbook "my-file-with-mutiple-sheets.xlsx")
 first
 (tech.v3.dataset.io.spreadsheet/sheet->dataset {}))
genmeblog commented 2 years ago

Maybe we can add additional small namespace to cover this case? I think about adding two functions: workbook and sheet->dataset + an alias in deps.edn with required dependency. Is there anything more worth importing?

behrica commented 2 years ago

Maybe we want that "tablecloth" has excel and arrow import working out of the box. But then it would need to declare depdendencies on it, which dataset avoided to do.

I think it is a "showstoper" for a beginner to not be able to load an excel file without additional dependencies. (same for arrow)