polydbms / sheetreader-duckdb

MIT License
44 stars 4 forks source link

feature request: reading remote files from ftp/sftp servers #58

Open gregorywaynepower opened 1 week ago

gregorywaynepower commented 1 week ago

Hello Folks,

I really enjoy the work y'all have done to speed up the reading of local spreadsheets. Is there any way y'all are able to read files from SFTP or FTP servers, perhaps working with the httpfs extension?

freddie-freeloader commented 1 week ago

Hey @gregorywaynepower :wave:

Thank you for opening this issue. :relaxed:

We were just yesterday talking about getting XLSX files from Google Sheets via HTTPS. So this would be related -- @harrygav might take a look at this.

gregorywaynepower commented 1 week ago

@freddie-freeloader If you're talking about reading Google Sheets, the DuckDB community extension gsheets does support that functionality.

harrygav commented 6 days ago

Hi @gregorywaynepower, thanks for your interest in SheetReader. Just to get a better understanding of your feature request: Do you want to directly load .xlsx files from FTP locations/paths, without downloading them first, i.e., giving an FTP path (and potentially credentials) to SheetReader?

gregorywaynepower commented 6 days ago

@harrygav After looking at my use-case it may not be an FTP or SFTP location path.

I'm trying to speed up the reading of a remote .xlsx file from https://services.wake.gov/realdata_extracts/. I don't want to have to download it completely unless it's absolutely necessary. I've been a bit disappointed with DuckDB's ability to read remote xlsx files since they rely on GDAL through the spatial extension for that functionality.