tidyverse / duckplyr

A drop-in replacement for dplyr, powered by DuckDB for performance.
https://duckplyr.tidyverse.org/
Other
289 stars 19 forks source link

Any plans to incorporate .xlsx files? #272

Open prasundutta87 opened 1 month ago

prasundutta87 commented 1 month ago

Hi,

I was wondering if there are any plans to incorporate to read *.xlsx files or any excel related files?

Regards, Prasun

krlmlr commented 1 month ago

Oh, that's a neat idea. @cboettig: also something for duckdbfs?

cboettig commented 1 month ago

duckdbfs is just a thin wrapper that doesn't really do anything itself. However duckdb already supports this (currently by leveraging GDAL via the spatial extension!) so just use open_dataset() with format="sf" and you should be good to go.

https://duckdb.org/docs/guides/file_formats/excel_import.html

prasundutta87 commented 1 month ago

Thanks for this.. How can this be implemented in R specifically?

cboettig commented 1 month ago
df <- duckdbfs::open_dataset("my_excel.xlsx", format="sf")
prasundutta87 commented 1 month ago

Oh cool! Didn't know duckdbfs was an R package as well! Thanks for this!

johnmackintosh commented 1 month ago

Is there a way to specify a specific sheet using the R approach?

cboettig commented 1 month ago

@johnmackintosh that's a good question. According to https://gdal.org/en/latest/drivers/vector/xls.html it should be opening each sheet as a layer. st_read takes a layer as an argument: https://duckdb.org/docs/extensions/spatial/functions.html#st_read.

But it looks like duckdbfs isn't letting you pass additional arguments to the various read methods, which it definitely ought to do. https://github.com/cboettig/duckdbfs/issues/28

johnmackintosh commented 1 month ago

@cboettig Thanks. Yes, I did try passing a layer argument, based on the duckdb excel_import docs you linked further up, but no joy. Thanks for looking into this, would be a total game-changer to be able to import a specific sheet.