Closed aero-girl closed 1 year ago
Hey @aero-girl :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.
I have encountered the same issue when using Databricks Runtime 11.3, but successfully installed when I downgraded to the Databricks Runtime to 10.4 LTS ML.
I have encountered the same issue when using Databricks Runtime 11.3, but successfully installed when I downgraded to the Databricks Runtime to 10.4 LTS ML.
You're right @nakany15 , I downgraded and it worked like a treat! Thank you 👍🏾
I'm seeing this too. I am wondering about this in the SBT build, which asks for the assembly
artifact of onnx-protobuf: https://github.com/microsoft/SynapseML/commit/f2e88fdea7c1010118913eecf0457b5daf881d25#diff-5634c415cd8c8504fdb973a3ed092300b43c4b8fc1e184f7249eb29a55511f91R405
I'm not sure how this is used later: https://github.com/microsoft/SynapseML/commit/f2e88fdea7c1010118913eecf0457b5daf881d25#diff-57f58593df9640766bdcfd4510a6ff307ce2056eb6832a52bc8ebc38d8a078e4R25
.... but it seems to not be looking for the assembly artifact there, but the main JAR (?)
The reason is that the build seems to obtain the assembly artifact of this library fine, but then isn't looking for that later in the install.
We had the same issue starting a few weeks ago, downgrading to DBR 10.4 LTS also worked.
Yeah, I'm not sure why that 'works' other than DBR 10.4 LTS is on Spark 3.2. But the error is about missing com.microsoft.azure_onnx-protobuf_2.12-0.9.1.jar
, not something particularly to do with Spark. My guess is something in DBR 10.4 also manages to pull in this dependency (?) It may be fine with Spark 3.3, just depends on whether the JARs happen to include this too.
DBR just uses Ivy/SBT for dependency resolution, nothing fancy. This could be down to a change in SBT resolution across versions or something. I was pointing out above that the build seems to depend on the -assembly
artifact, but then something references the normal non-assembly artifact. Not sure that is a real issue, but it looked like a potential conflict.
@srowen @aero-girl @nakany15 This is a known issue on Databricks spark 3.3 which we are actively trying to fix (@KeerthiYandaOS leading the charge here). For now the workaround is to use a spark 3.2 cluster. We will report back with a new version to use when we get to the bottom of this. Thanks for your patience
@srowen @aero-girl @nakany15 we are seeing this issue because of the maven resolution change represented starting from DBR 11.0. can you use spark.databricks.libraries.enableMavenResolution false
spark configuration on the Spark3.3 cluster to resolve the issue temporarily until our new version of SynapseML is available.
Ah, I knew it was something like that. SBT vs Maven resolution is an ancient enemy of mine. Thanks for the tip!
@srowen @aero-girl @nakany15 Looks like DBR 11.0 doesn't require spark.databricks.libraries.enableMavenResolution false
anymore, I could install the jar without that property anymore. We also have a new SynapseML version for Spark3.3 which overcomes the need for spark maven resolution property: com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT
. Either way, we should be good with Spark3.3 on DBR 11.
Closing this issue as the solution is posted. Please feel free to open it if you are still facing errors. Thank you.
@srowen @aero-girl @nakany15 Looks like DBR 11.0 doesn't require
spark.databricks.libraries.enableMavenResolution false
anymore, I could install the jar without that property anymore. We also have a new SynapseML version for Spark3.3 which overcomes the need for spark maven resolution property:com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT
. Either way, we should be good with Spark3.3 on DBR 11.Closing this issue as the solution is posted. Please feel free to open it if you are still facing errors. Thank you.
Hi @KeerthiYandaOS , just opening this post up again as I am still facing errors. Trying to get this working on 11.3 LTS ML (includes Apache Spark 3.3.0, Scala 2.12) using com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT as maven coordinates.
The following is the error:
Tested with the following maven coordinates: com.microsoft.azure:synapseml_2.12:0.10.2, and it works! 🙌🏽🚀
@aero-girl can you please try installing again with Spark3.3 maven coordinates com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT
and see if you still see the error? I was able to install the same version on the 11.3 LTS ML (includes Apache Spark 3.3.0, Scala 2.12)
cluster
spark.databricks.libraries.enableMavenResolution false
Yeas sure @KeerthiYandaOS
Tested and failed with maven coordinates: com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT
However as shown below, it works with maven coordinates: com.microsoft.azure:synapseml_2.12:0.10.2
es.githubusercontent.com/5180795/227247366-a487d949-6b75-4f51-aae1-04a98926eeff.PNG)
@aero-girl can you please provide https://mmlspark.azureedge.net/maven
as the repository URL as com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT
is the SNAPSHOT version and is not published to default maven central repository.
com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT
@KeerthiYandaOS thank you, that has worked! 🙌🏽😃
Thank you for the confirmation @aero-girl. Closing the thread as the issue is resolved.
SynapseML version
com.microsoft.azure:synapseml_2.12:0.10.2
System information
Describe the problem
Trying to install cognitive services on Databricks following this instructions provided.
Have tried the following Maven coordinates:
They are fail and the following is the error message received: "Library installation attempted on the driver node of cluster 0110-113436-jrinjmhu and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/com.microsoft.azure_onnx-protobuf_2.12-0.9.1.jar does not exist"
I did have this working on the 14/01/2023 using the com.microsoft.azure:synapseml_2.12:0.10.2 Maven coordinates - see screenshot below.
Thanks!
Code to reproduce issue
import pyspark spark = pyspark.sql.SparkSession.builder.appName("MyApp") .config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.10.2") # Please use 0.10.2 version for Spark3.2 and 0.9.5-13-d1b51517-SNAPSHOT version for Spark3.1 .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") .getOrCreate()
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrations