microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.06k stars 832 forks source link

Speech to text example is not working in Databricks #1568

Open MichalSzopinski00 opened 2 years ago

MichalSzopinski00 commented 2 years ago

Hello Dear Product group,

I tried to re-run this example on my Databricks notebook: https://microsoft.github.io/SynapseML/docs/features/cognitive_services/CognitiveServices%20-%20Overview/#speech-to-text-sample

I used DBR 10.4 LTS (with spark 3.2.1 and scala 2.12) and also accordingly to installation guide which is here:

https://github.com/microsoft/SynapseML#databricks

I installed the library com.microsoft.azure:synapseml_2.12:0.10.0 with the resolver: https://mmlspark.azureedge.net/maven

When I'm trying to use this library it's running for infinity, not sure why.

image

I would expect this to be executed within up to 5 min (since this is an example)

from the cognitive perspective I used speech to text service, I also tried this with cognitive multi-service and issue persisted. I tested this on several subscriptions, and I couldn't make it work. I also manage to change the DBR to 10.1 but the issue persisted :(

Could you please help me resolve this issue? I believe that this is a bug.

AB#1886117

MichalSzopinski00 commented 2 years ago

Note#1: you won't be able to use the service key since the resource doesn't exist already :)

mhamilton723 commented 2 years ago

Could not repro this on my side, will try with a fresh cluster next

MichalSzopinski00 commented 2 years ago

I got this issue on west europe Vnet injected Azure Databricks workspace. I also tried to deploy non vnet injected workspace and on non vnet injected it was working fine. Strange since I don't have any custom DNS, firewall, or route table set up on Vnet injected workspace which could potentially block the traffic to the internet. From Vnet Injected workspace I'm able to perform the telnet to 8.8.8.8 using port 443 and download the entire library from the maven without any issues. I will try to establish a connection using SDK from non-working databricks workspace and let you know about the results :)