microsoft / SynapseML

Simple and Distributed Machine Learning
http://aka.ms/spark
MIT License
5.07k stars 831 forks source link

Spark 3.0 build #813

Open w1nk opened 4 years ago

w1nk commented 4 years ago

Hello!

We're currently using mmlspark in our spark 2.4 clusters to awesome effect (training 3.4 billion rows, ~600gb of data). Thanks for all the work!

There is a desire within our organization to migrate these clusters to spark 3.0. We attempted to build mmlspark against spark 3, but enough things have been renamed/relocated that there is a pile of build errors [1].

Has anyone successfully built mmlspark against spark 3.0? If not, we may be able to get it patched up to build.

Thanks!

[1] - mmlspark/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/HTTPSourceV2.scala:26:37: object v2 is not a member of package org.apache.spark.sql.sources [error] import org.apache.spark.sql.sources.v2._

This entire v2 package has been renamed / relocated, with the bulk of it seemingly in this patch: https://github.com/apache/spark/commit/053dd858d38e6107bc71e0aa3a4954291b74f8c8

welcome[bot] commented 4 years ago

👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.

w1nk commented 4 years ago

Hello! Spark 3.0.0 was released about a week ago.

https://spark.apache.org/news/spark-3-0-0-released.html

Thanks!

fengkehh commented 4 years ago

Does the MMLSpark team plan to migrate to (Py)Spark 3.0 in the near future?

brunocous commented 4 years ago

I also would be very interested in (Py)Spark 3.0.0 and Scala 2.12 support. How can we help?

itechbear commented 4 years ago

Just saw a related pending PR: https://github.com/Azure/mmlspark/pull/912

juanpaulo commented 3 years ago

Resolved by #970, I suppose?

saikiranvadhi commented 3 years ago

mmlspark doesn't work on Spark 3 yet, throws this error: https://github.com/Azure/mmlspark/issues/891

imatiach-msft commented 3 years ago

latest mmlspark on master supports spark 3.0

rgordon commented 3 years ago

Is it possible to get it pushed to the maven repo so that we can install in into other clusters easily? (I created issue 1031 with this in mind but now see this one too.)

avanunts commented 3 years ago

hi @imatiach-msft Trying to get the latest mmlspark-master-build like you suggest, but getting an error:

Could not find com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-169-80889120-SNAPSHOT.
     Searched in the following locations:
     ...
       - https://mmlspark.azureedge.net/maven/com/microsoft/ml/spark/mmlspark_2.12/1.0.0-rc3-169-80889120-SNAPSHOT/maven-metadata.xml
       - https://mmlspark.azureedge.net/maven/com/microsoft/ml/spark/mmlspark_2.12/1.0.0-rc3-169-80889120-SNAPSHOT/mmlspark_2.12-1.0.0-rc3-169-80889120-SNAPSHOT.pom

Also tried the version from this comment com.microsoft.ml.spark:mmlspark_2.12:1.0.0-rc3-59-bf337941-SNAPSHOT, but got the same error.

Can you pls refine which version should i use?