oap-project / oap-mllib

Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.
Apache License 2.0
20 stars 12 forks source link

[Cloud][Databricks] OAP MLLib jar can not be loaded by Databricks runtime #138

Open yao531441 opened 2 years ago

yao531441 commented 2 years ago

We are going to get the OAP MLlib performance gain on Databricks, but it seems OAP MLLib jar can not be loaded by Databricks runtime. The error log is as blow: image

We use Kmeans Demo to test.

import org.apache.spark.ml.clustering.KMeans
import org.apache.spark.ml.evaluation.ClusteringEvaluator

spark.sparkContext.setLogLevel("INFO")

val dataset = spark.read.format("libsvm").load("/FileStore/mllib_data/sample_kmeans_data.txt")

// Trains a k-means model.
val kmeans = new KMeans().setK(2).setSeed(1L)
val model = kmeans.fit(dataset)

// Make predictions
val predictions = model.transform(dataset)

// Evaluate clustering by computing Silhouette score
val evaluator = new ClusteringEvaluator()

val silhouette = evaluator.evaluate(predictions)
println(s"Silhouette with squared euclidean distance = $silhouette")

// Shows the result.
println("Cluster Centers: ")
model.clusterCenters.foreach(println)
xwu99 commented 2 years ago

Could I know if Databricks runtime are using K8S as cluster manager?

yao531441 commented 2 years ago

Could I know if Databricks runtime are using K8S as cluster manager?

I try to dig this information out, but I couldn't find any info about it from the official docs. According to Databrick's slides page 23, I guess Databricks uses its own cluster manager.