Open VincentSaelzlerFRA opened 1 month ago
We don't have support for inlining the authentication into the path - please pass the authentication information in the storage options - see https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html#variants for the keys.
@nameexhaustion can we add a section in the user-guide on authentication of Polars? Better to save the indirection (object-store) and directly show it on our side.
@nameexhaustion thanks for the prompt reply!
please pass the authentication information in the storage options
Unfortunately, none of the following authentication details are available to pass.
That's because "use_azure_cli": "True"
is the authentication information. Per the documentation link you sent, it specifies that Polars should "Use azure cli for acquiring access token"
Passing extra parameters about the environment would be possible, if that helps. For example things like
@VincentSaelzlerFRA , could you try using scan_csv(..).collect()
instead of read_csv
? I had a look and it seems read_csv
currently does not go through our native cloud downloading code path.
@nameexhaustion using scan_csv(..).collect()
succeeded. Thanks for the workaround!
Updated minimal working example:
import polars as pl
# Azure Data Lake Storge Gen2
STORAGE_ACCOUNT = "myaccount"
CONTAINER = "mycontainer"
STORAGE_OPTIONS = {"use_azure_cli": "True"}
lf = pl.scan_csv(
source=f"abfss://{CONTAINER}@{STORAGE_ACCOUNT}.dfs.core.windows.net/example.csv",
storage_options=STORAGE_OPTIONS,
)
df = lf.collect()
Checks
Reproducible example
Log output
Issue description
The failure happens because a call is made to get blob properties without passing any credentials.
Specifially, a
HEAD
request tohttps://STORAGE_ACCOUNT.blob.core.windows.net/CONTAINER/example.csv
I am sure that Azure CLI credentals are working in my environment, because replacing
read_csv
withread_parquet
results in a successful file download.Also, I have been successfully using
read_parquet
on parquet files in the same storage container using the same credentials without issue.Expected behavior
The CSV file contents would be loaded into a dataframe.
Installed versions
Also,
adlfs==2024.4.1