Open deanm0000 opened 2 months ago
Hmm. If you just want the sheet names, I don't think that really belongs in Polars 🤔
import polars as pl
import fastexcel
sheet_names = fastexcel.read_excel("test.xlsx").sheet_names
df = pl.DataFrame({"sheets": sheet_names})
# shape: (3, 1)
# ┌────────┐
# │ sheets │
# │ --- │
# │ str │
# ╞════════╡
# │ Sheet1 │
# │ Sheet2 │
# │ Sheet3 │
# └────────┘
And if you want the sheet names associated with table/frame data, that's already possible:
pl.read_excel("test.xlsx", sheet_id=0)
# {"Sheet1": ..., "Sheet2": ..., "Sheet3": ...}
@alexander-beedie Just for reference, it was asked by a user on SO:
But it seems like the sheet_id=0
approach is what they really want.
The case for loading the sheet names only is if you have a giant excel file that you only need one sheet from. You know roughly what the name is but it's got names like "1 apple", "2 banana", "3 carrot", etc. Sometimes the source of the file skips a tab so you get "1 apple", "2 carrot", etc. If you only want "x rutabaga", you can't reliably just do "18 rutabaga", you have to get the sheet names and find the one that contains rutabaga.
Of course, the easiest way to do that is importing fastexcel to get the sheet names but, to me, it seems like if we're going to have pl.read_excel
rather than telling people to load fastexcel and use its to_polars
method then having a way to list the sheet names without the extra import seems in scope too.
@cmdlineluser I did see that one. I really dislike how pandas makes an ExcelFile class so I certainly wouldn't want to go that route.
Description
I'm not sure if it's better to add a method or just a parameter to
read_excel
but add a means of getting a DataFrame/List/Dict/whatever of sheet names.Something like