microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.07k stars 831 forks source link

ONNX assembly jar not installable on Databricks Spark 3.3 clusters [BUG] #1817

Closed aero-girl closed 1 year ago

aero-girl commented 1 year ago

SynapseML version

com.microsoft.azure:synapseml_2.12:0.10.2

System information

Describe the problem

Trying to install cognitive services on Databricks following this instructions provided.

Have tried the following Maven coordinates:

They are fail and the following is the error message received: "Library installation attempted on the driver node of cluster 0110-113436-jrinjmhu and failed. Please refer to the following error message to fix the library or contact Databricks support. Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File file:/local_disk0/tmp/clusterWideResolutionDir/maven/ivy/jars/com.microsoft.azure_onnx-protobuf_2.12-0.9.1.jar does not exist"

I did have this working on the 14/01/2023 using the com.microsoft.azure:synapseml_2.12:0.10.2 Maven coordinates - see screenshot below.

Thanks!

image

Code to reproduce issue

import pyspark spark = pyspark.sql.SparkSession.builder.appName("MyApp") .config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:0.10.2") # Please use 0.10.2 version for Spark3.2 and 0.9.5-13-d1b51517-SNAPSHOT version for Spark3.1 .config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven") .getOrCreate()

Other info / logs

No response

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions[bot] commented 1 year ago

Hey @aero-girl :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.

nakany15 commented 1 year ago

I have encountered the same issue when using Databricks Runtime 11.3, but successfully installed when I downgraded to the Databricks Runtime to 10.4 LTS ML.

aero-girl commented 1 year ago

I have encountered the same issue when using Databricks Runtime 11.3, but successfully installed when I downgraded to the Databricks Runtime to 10.4 LTS ML.

You're right @nakany15 , I downgraded and it worked like a treat! Thank you 👍🏾

srowen commented 1 year ago

I'm seeing this too. I am wondering about this in the SBT build, which asks for the assembly artifact of onnx-protobuf: https://github.com/microsoft/SynapseML/commit/f2e88fdea7c1010118913eecf0457b5daf881d25#diff-5634c415cd8c8504fdb973a3ed092300b43c4b8fc1e184f7249eb29a55511f91R405

I'm not sure how this is used later: https://github.com/microsoft/SynapseML/commit/f2e88fdea7c1010118913eecf0457b5daf881d25#diff-57f58593df9640766bdcfd4510a6ff307ce2056eb6832a52bc8ebc38d8a078e4R25

.... but it seems to not be looking for the assembly artifact there, but the main JAR (?)

The reason is that the build seems to obtain the assembly artifact of this library fine, but then isn't looking for that later in the install.

datashift24 commented 1 year ago

We had the same issue starting a few weeks ago, downgrading to DBR 10.4 LTS also worked.

srowen commented 1 year ago

Yeah, I'm not sure why that 'works' other than DBR 10.4 LTS is on Spark 3.2. But the error is about missing com.microsoft.azure_onnx-protobuf_2.12-0.9.1.jar, not something particularly to do with Spark. My guess is something in DBR 10.4 also manages to pull in this dependency (?) It may be fine with Spark 3.3, just depends on whether the JARs happen to include this too.

DBR just uses Ivy/SBT for dependency resolution, nothing fancy. This could be down to a change in SBT resolution across versions or something. I was pointing out above that the build seems to depend on the -assembly artifact, but then something references the normal non-assembly artifact. Not sure that is a real issue, but it looked like a potential conflict.

mhamilton723 commented 1 year ago

@srowen @aero-girl @nakany15 This is a known issue on Databricks spark 3.3 which we are actively trying to fix (@KeerthiYandaOS leading the charge here). For now the workaround is to use a spark 3.2 cluster. We will report back with a new version to use when we get to the bottom of this. Thanks for your patience

KeerthiYandaOS commented 1 year ago

@srowen @aero-girl @nakany15 we are seeing this issue because of the maven resolution change represented starting from DBR 11.0. can you use spark.databricks.libraries.enableMavenResolution false spark configuration on the Spark3.3 cluster to resolve the issue temporarily until our new version of SynapseML is available.

srowen commented 1 year ago

Ah, I knew it was something like that. SBT vs Maven resolution is an ancient enemy of mine. Thanks for the tip!

KeerthiYandaOS commented 1 year ago

@srowen @aero-girl @nakany15 Looks like DBR 11.0 doesn't require spark.databricks.libraries.enableMavenResolution false anymore, I could install the jar without that property anymore. We also have a new SynapseML version for Spark3.3 which overcomes the need for spark maven resolution property: com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT. Either way, we should be good with Spark3.3 on DBR 11.

Closing this issue as the solution is posted. Please feel free to open it if you are still facing errors. Thank you.

aero-girl commented 1 year ago

@srowen @aero-girl @nakany15 Looks like DBR 11.0 doesn't require spark.databricks.libraries.enableMavenResolution false anymore, I could install the jar without that property anymore. We also have a new SynapseML version for Spark3.3 which overcomes the need for spark maven resolution property: com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT. Either way, we should be good with Spark3.3 on DBR 11.

Closing this issue as the solution is posted. Please feel free to open it if you are still facing errors. Thank you.

Hi @KeerthiYandaOS , just opening this post up again as I am still facing errors. Trying to get this working on 11.3 LTS ML (includes Apache Spark 3.3.0, Scala 2.12) using com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT as maven coordinates.

The following is the error: Capture

aero-girl commented 1 year ago

Tested with the following maven coordinates: com.microsoft.azure:synapseml_2.12:0.10.2, and it works! 🙌🏽🚀

KeerthiYandaOS commented 1 year ago

@aero-girl can you please try installing again with Spark3.3 maven coordinates com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT and see if you still see the error? I was able to install the same version on the 11.3 LTS ML (includes Apache Spark 3.3.0, Scala 2.12) cluster image image

aero-girl commented 1 year ago

spark.databricks.libraries.enableMavenResolution false

Yeas sure @KeerthiYandaOS Tested and failed with maven coordinates: com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT However as shown below, it works with maven coordinates: com.microsoft.azure:synapseml_2.12:0.10.2

Capture1 Capture2 Capture3 es.githubusercontent.com/5180795/227247366-a487d949-6b75-4f51-aae1-04a98926eeff.PNG)

KeerthiYandaOS commented 1 year ago

@aero-girl can you please provide https://mmlspark.azureedge.net/maven as the repository URL as com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT is the SNAPSHOT version and is not published to default maven central repository. image

aero-girl commented 1 year ago

com.microsoft.azure:synapseml_2.12:0.11.0-32-6085190e-SNAPSHOT

@KeerthiYandaOS thank you, that has worked! 🙌🏽😃 image

KeerthiYandaOS commented 1 year ago

Thank you for the confirmation @aero-girl. Closing the thread as the issue is resolved.