jeppe742 / DeltaLakeReader

Read Delta tables without any Spark
Apache License 2.0
47 stars 14 forks source link

Azure Function - Error Reading Delta Table from Azure Storage Account #47

Open rafael-gomez-61 opened 1 year ago

rafael-gomez-61 commented 1 year ago

Azure Function - When trying to access a delta table on our Azure storage account, I get an error when I call DeltaTable class. Same code on PyCharm, no error.

I need HELP!! Am I missing something???

I have the following defined in the requirements.txt azure-functions azure-identity pyodbc delta-lake-reader[azure] Requirement already satisfied: delta-lake-reader[azure] in c:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages (from -r requirements.txt (line 8)) (0.2.13)

def readDeltaTableSchema(container_name: str, schema_name: str, delta_table: str) -> str:
    from deltalake import DeltaTable
    from adlfs import AzureBlobFileSystem

    VZNTDIRECTORYID = os.getenv('AZ_STORAGE_VZNTDIRECTORYID')
    VZNTID = os.getenv('AZ_STORAGE_VZNTID')
    VZNTSECRET = os.getenv('AZ_STORAGE_VZNTSECRET')

    az_account_name = os.getenv('AZ_STORAGE_STARBURST_ACCT')
    vznt_tenant_storage_account = os.getenv('AZ_STORAGE_STARBURST_ACCT')
    Storage_URL = "https://{vznt_tenant_storage_account}.dfs.core.windows.net"

    az_container = container_name
    az_schema = schema_name
    az_delta_table = delta_table

    url = f"abfss://{az_container}@{az_account_name}.dfs.core.windows.net/delta/{az_schema}/{az_delta_table}"

    fs = AzureBlobFileSystem(
        account_name=az_account_name, account_url=Storage_URL,
        client_id=VZNTID, client_secret=VZNTSECRET, tenant_id=VZNTDIRECTORYID
    )
--> deltaTableSchemaMeta = DeltaTable(url, file_system=fs)

Error: [2022-12-20T18:27:16.855Z] Executed 'Functions.register-data-sourceHTTPTrigger' (Failed, Id=bbfa069d-d7c9-475c-860b-3e593a0e0378, Duration=16026ms) [2022-12-20T18:27:16.858Z] System.Private.CoreLib: Exception while executing function: Functions.register-data-sourceHTTPTrigger. System.Private.CoreLib: Result: Failure Exception: HttpResponseError: Operation returned an invalid status 'The specifed resource name contains invalid characters.' ErrorCode:InvalidResourceName Stack: File "C:\ProgramData\chocolatey\lib\azure-functions-core-tools-3\tools\workers\python\3.9/WINDOWS/X64\azure_functions_worker\dispatcher.py", line 402, in _handleinvocation_request call_result = await self._loop.run_in_executor( File "C:\Program Files\Python39\lib\concurrent\futures\thread.py", line 52, in run result = self.fn(*self.args, self.kwargs) File "C:\ProgramData\chocolatey\lib\azure-functions-core-tools-3\tools\workers\python\3.9/WINDOWS/X64\azure_functions_worker\dispatcher.py", line 606, in _run_sync_func return ExtensionManager.get_sync_invocation_wrapper(context, File "C:\ProgramData\chocolatey\lib\azure-functions-core-tools-3\tools\workers\python\3.9/WINDOWS/X64\azure_functions_worker\extension.py", line 215, in _raw_invocation_wrapper result = function(args) File "C:\working-folder\az-func\http-register-dta-source\register-data-sourceHTTPTrigger__init.py", line 30, in main storage_access() File "C:\working-folder\az-func\http-register-dta-source\register-data-sourceHTTPTrigger__init__.py", line 62, in storage_access val = readDeltaTableSchema(az_container, az_schema, az_delta_table) File "C:\working-folder\az-func\http-register-dta-source\register-data-sourceHTTPTrigger\init__.py", line 91, in readDeltaTableSchema deltaTableSchemaMeta = DeltaTable(url, file_system=fs) File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\deltalake\deltatable.py", line 40, in init__ if not self._is_delta_table(): File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\deltalake\deltatable.py", line 62, in _is_delta_table return self.filesystem.exists(f"{self.log_path}") File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\adlfs\spec.py", line 1292, in exists return sync(self.loop, self._exists, path) File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\fsspec\asyn.py", line 71, in sync raise return_result File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\fsspec\asyn.py", line 25, in _runner result[0] = await coro File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\adlfs\spec.py", line 1314, in _exists if await bc.exists(): File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\azure\core\tracing\decorator_async.py", line 79, in wrapper_use_tracer return await func(*args, **kwargs) File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\azure\storage\blob\aio_blob_client_async.py", line 652, in exists process_storage_error(error) File "C:\working-folder\az-func\http-register-dta-source.venv\lib\site-packages\azure\storage\blob_shared\response_handlers.py", line 185, in process_storage_error exec("raise error from None") # pylint: disable=exec-used # nosec File "", line 1, in .

jeppe742 commented 1 year ago

Hey @rafael-gomez-61

It's a bit hard for me to troubleshoot the issue, especially since I haven't worked much with Azure Functions.

I'm actually a bit surprised if you say this code works locally. If you see in the readme the idea was actually that the path you pass to DeltaTable shouldn't contain all the azure specific metadata.

So what happens if you change your url to something like this?

url = f"{az_container}/delta/{az_schema}/{az_delta_table}"

Otherwise, maybe you can try to make sure the environment variables are set correct, by logging the url