slingdata-io / sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
https://docs.slingdata.io
GNU General Public License v3.0
299 stars 16 forks source link

Cannot use wildcard with CSV files in Azure Blob Storage #314

Closed kristianandre closed 6 days ago

kristianandre commented 3 weeks ago

Issue Description

I have tried to follow the examples here using Azure Blob Storage as the source and DuckDB as the target. However, when using the *.csv glob pattern, I got the following message:

WRN Did not match any streams. Exiting.

I have double checked that it works when referencing a specific csv, e.g., container_name/folder_name/file.csv file or when I reference a "directory" such as container_name/folder_name/. I have also made sure that the SAS token has read and list permissions.

I know fsspec had a pretty recent change in how it treats glob (here is the changelog for adlfs). I am guessing that is unrelated.

source: AZURE_BLOB_STORAGE
target: DUCKDB

defaults:
  mode: full-refresh
  object: 'target_schema.{stream_file_folder}_{stream_file_name}'
  source_options:
    format: csv

streams:
  "folder_name/*.csv":
WRN Did not match any streams. Exiting.
flarco commented 3 weeks ago

Hi, if you run sling conns discover AZURE_BLOB_STORAGE -p 'folder_name/*.csv', what happens?

kristianandre commented 3 weeks ago

Hi, thanks for taking a look! With that command I actually get a list of csv files printed in the terminal, but the replication file still fails with the same warning.

flarco commented 6 days ago

Fixed for next release: https://github.com/slingdata-io/sling-cli/pull/318/commits/03d57b6cde4b165ae622f91ff713db202f412ceb Closing.