Open FedGlz opened 1 year ago
Hi, I am joining the question. Same problem here
Cheers :)
Hi, I am encountering the problem as well.
I've reached out to Databricks to hopefully get a fresh set of SAS tokens for those storage accounts, or perhaps new accounts if needed.
Having the same issue, seems they didn't resolve it yet.
Any update on this. I also have the same block. Was working fine a few weeks ago. Returned to the course and now blocked on this security issue.
Thanks
Have the same issue starting from 1st lab and further. I would say that with such an issue the full course becomes to be completely useless. As I see the issue is already 4 months old. I am not sure if there is a hope to resolve issue in reasonable time. It means that all the people bought this course wasted their money and efforts.
I'm taking the same "Perform data science with Azure Databricks" online class through Coursera and encountered the same Databricks error in the notebook "1. Reading Data - CSV":
AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Caused by: StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
After a LOT of research, trial and more errors, I was able to find and employ the following workaround:
csvFile = "abfss://<my-container>@<my-storage-account>.dfs.core.windows.net/pageviews_by_second.tsv"
tempDF = (spark.read .format("csv") .option("sep", "\t") .load(csvFile) )
Which ran without errors, and I was able to execute the rest of the cells in the notebook.
(I have to say I really love that "Diagnose Error" button in the bottom left corner of the notebook cell.)
My only challenge now is finding copies of the files (JSON, Parquet, etc.) in the rest of the notebooks. @joelhulen, is there an FTP site where the rest of these tutorial files are available?
Update: Azure Data Lake Storage Gen1 will be retired in 2024.
Here are the steps I took to implement Databricks connectivity to Azure Data Lake using a Gen2 storage account:
In the Azure Portal, upgrade your storage account to Azure Data Lake Storage Gen2:
Also in the portal, create a service principal and secret, then assign the following role to the service principal in PowerShell:
New-AzRoleAssignment -RoleDefinitionName "Storage Blob Data Contributor" `
-ApplicationId "571278d3-4bef-44b7-a5c0-449899d57105" `
-Scope "/subscriptions/6647ccdf-614c-49a5-89d1-a1237ec5b08e/resourceGroups/rg-AzureFunctionQuickstart/providers/Microsoft.Storage/storageAccounts/azfuncqsstorage241"
# Use OAuth2 Credentials to access ADLS Gen2
spark.conf.set("fs.azure.account.auth.type.mldeltalake.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.mldeltalake.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.mldeltalake.dfs.core.windows.net", "<service principal ID>")
spark.conf.set("fs.azure.account.oauth2.client.secret.mldeltalake.dfs.core.windows.net", "<service principal secret>")
spark.conf.set("fs.azure.account.oauth2.client.endpoint.mldeltalake.dfs.core.windows.net", "https://login.microsoftonline.com/<tenant ID>/oauth2/token")
csvFile = "abfss://<container>@<storage-account>.dfs.core.windows.net/pageviews_by_second.tsv"
tempDF = (spark.read
.format("csv")
.option("sep", "\t")
#.option("header", True) # If the file has headers
#.option("inferSchema", True) # Infer data types
.load(csvFile)
)
It looks like the data might be here:
dbfs:/databricks-datasets/wikipedia-datasets/data-001/pageviews/raw/pageviews_by_second.tsv
Replace the /mnt/blah/blah path with that one, and it might just work.
This is frustrating. My opinion of Coursera is going down.
The solution from @Wang22004K also works for me. Sadly it seems the repo is no longer maintained
Hi,
I am taking the course 'Perform data science with Azure Databricks' in Coursera and followed the instructions to create a Workspace and cluster to being able to run the learning notebooks imported from
https://github.com/solliancenet/microsoft-learning-paths-databricks-notebooks/blob/master/data-engineering/DBC/03-Reading-and-writing-data-in-Azure-Databricks.dbc?raw=true
When trying to run the notebooks as per the course instructions, I am getting the following error: 'Server failed to authenticate the request. Make sure the value of authorization header is formed correctly including signature.
This is the first time that I am using Databricks and I have no clue on how to troubleshoot it. I've read many similar posts, but I don't know which one is relevant to my case as I believe the authentication is done via Azure credentials ?
Thanks for your help in advance :)