solliancenet / microsoft-learning-paths-databricks-notebooks

Contains notebooks used in the Microsoft Azure Databricks Learning Paths modules.
172 stars 166 forks source link

Databricks: Server Failed to Authenticate the request. Make sure the value of Authorization header is formed correctly including the signature #11

Open FedGlz opened 1 year ago

FedGlz commented 1 year ago

Hi,

I am taking the course 'Perform data science with Azure Databricks' in Coursera and followed the instructions to create a Workspace and cluster to being able to run the learning notebooks imported from

https://github.com/solliancenet/microsoft-learning-paths-databricks-notebooks/blob/master/data-engineering/DBC/03-Reading-and-writing-data-in-Azure-Databricks.dbc?raw=true

When trying to run the notebooks as per the course instructions, I am getting the following error: 'Server failed to authenticate the request. Make sure the value of authorization header is formed correctly including signature.

This is the first time that I am using Databricks and I have no clue on how to troubleshoot it. I've read many similar posts, but I don't know which one is relevant to my case as I believe the authentication is done via Azure credentials ?

Thanks for your help in advance :)

AndrzejMachura commented 1 year ago

Hi, I am joining the question. Same problem here

Cheers :)

redperiabras commented 1 year ago

Hi, I am encountering the problem as well.

joelhulen commented 1 year ago

I've reached out to Databricks to hopefully get a fresh set of SAS tokens for those storage accounts, or perhaps new accounts if needed.

dafteaux commented 11 months ago

Having the same issue, seems they didn't resolve it yet.

rastar-sanders commented 10 months ago

Any update on this. I also have the same block. Was working fine a few weeks ago. Returned to the course and now blocked on this security issue.

Thanks

Yaroslav-Lyutvinskiy commented 10 months ago

Have the same issue starting from 1st lab and further. I would say that with such an issue the full course becomes to be completely useless. As I see the issue is already 4 months old. I am not sure if there is a hope to resolve issue in reasonable time. It means that all the people bought this course wasted their money and efforts.

mistermaxx commented 10 months ago

I'm taking the same "Perform data science with Azure Databricks" online class through Coursera and encountered the same Databricks error in the notebook "1. Reading Data - CSV":

AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Caused by: StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

After a LOT of research, trial and more errors, I was able to find and employ the following workaround:

  1. Create a storage account in the Azure Portal. Then, in Overview > Properties > Blob Service, disable Blob soft delete.
  2. Create a container for the storage account. I then found the file "pageviews-by-second-tsv.gz" on the web, downloaded it to my Windows 10 laptop and decompressed it with 7-Zip. I then uploaded the file to my container.
  3. In Databricks, I created a compute cluster with the Advanced option "Enable credential passthrough for user-level data access" enabled.
  4. In the notebook "1. Reading Data - CSV", I skipped down to the cell "Step 1 - Read The CSV File" and inserted the following code:
    
    csvFile = "abfss://<my-container>@<my-storage-account>.dfs.core.windows.net/pageviews_by_second.tsv"

tempDF = (spark.read .format("csv") .option("sep", "\t") .load(csvFile) )


Which ran without errors, and I was able to execute the rest of the cells in the notebook.
(I have to say I really love that "Diagnose Error" button in the bottom left corner of the notebook cell.)
My only challenge now is finding copies of the files (JSON, Parquet, etc.) in the rest of the notebooks. @joelhulen, is there an FTP site where the rest of these tutorial files are available?
mistermaxx commented 10 months ago

Update: Azure Data Lake Storage Gen1 will be retired in 2024.

https://learn.microsoft.com/en-us/answers/questions/281107/retirement-announcement-azure-data-lake-storage-ge

Here are the steps I took to implement Databricks connectivity to Azure Data Lake using a Gen2 storage account:

  1. In the Azure Portal, upgrade your storage account to Azure Data Lake Storage Gen2: > Data Lake Gen2 upgrade. Before the upgrade, be sure to disable Blob and container soft delete: https://ganeshchandrasekaran.com/azure-databricks-configure-your-storage-container-to-load-and-write-data-to-azure-object-storage-3db8cd506a25

  2. Also in the portal, create a service principal and secret, then assign the following role to the service principal in PowerShell:

New-AzRoleAssignment -RoleDefinitionName "Storage Blob Data Contributor" `
-ApplicationId "571278d3-4bef-44b7-a5c0-449899d57105" `
   -Scope  "/subscriptions/6647ccdf-614c-49a5-89d1-a1237ec5b08e/resourceGroups/rg-AzureFunctionQuickstart/providers/Microsoft.Storage/storageAccounts/azfuncqsstorage241"
  1. In Databricks, paste and run the following code in the 1 Reading Data - CSV notebook cell:
# Use OAuth2 Credentials to access ADLS Gen2
spark.conf.set("fs.azure.account.auth.type.mldeltalake.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.mldeltalake.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.mldeltalake.dfs.core.windows.net", "<service principal ID>")
spark.conf.set("fs.azure.account.oauth2.client.secret.mldeltalake.dfs.core.windows.net", "<service principal secret>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint.mldeltalake.dfs.core.windows.net", "https://login.microsoftonline.com/<tenant ID>/oauth2/token")

csvFile = "abfss://<container>@<storage-account>.dfs.core.windows.net/pageviews_by_second.tsv"

tempDF = (spark.read
          .format("csv")
          .option("sep", "\t") 
          #.option("header", True)  # If the file has headers
          #.option("inferSchema", True)  # Infer data types
          .load(csvFile)
         )
Wang22004K commented 7 months ago

It looks like the data might be here:

dbfs:/databricks-datasets/wikipedia-datasets/data-001/pageviews/raw/pageviews_by_second.tsv

Replace the /mnt/blah/blah path with that one, and it might just work.

This is frustrating. My opinion of Coursera is going down.

lfsalasg commented 3 weeks ago

The solution from @Wang22004K also works for me. Sadly it seems the repo is no longer maintained