Azure Databricks: Sparkling Water UI

eldhosepaul7 commented 1 year ago

For Azure databricks, the Sparkling water flow ui gives an error HTTP ERROR 500 java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq; when its launched from spark ui tab.

Once we run the below command it creates H2OContext and to access the web ui, it gives instructions to Open H2O Flow in browser: Go to Spark UI > Sparkling Water tab > click Flow UI link

from pysparkling import *
hc = H2OContext.getOrCreate()

PySparkling version: 3.2
Databricks Spark Version: 3.2
Databricks DBR: 10.4 LTS
H2O & Spark logs if not running on YARN. You can find these logs in Spark work directory : Can provide if needed
Are you using: Windows
Spark & Sparkling Water configuration including the memory configuration: Default configs

Please also provide us with the full and minimal reproducible code.

How to Reproduce:

Create a Azure Databricks cluster with DBR 10.4 LTS
Install h2o-pysparkling-3.2 from PyPi
Create a H2OContext, and navigate to sparking water ui from the spark UI

Behavior is same for DBR 11.3 LTS and 12.2 LTS , Spark version 3.3.0 and Pysparking version h2o-pysparkling-3.3

Only scenario it works is DBR 9.1 LTS , Spark Version 3.1.2 and Pysparking version h2o-pysparkling-3.1, there might be some breaking changes introduced in version 3.2 and 3.3.

Note: we can access directly access from the link without going though the spark ui -> sparkling ui: https://<adb-link>/driver-proxy/o/<orgId>/<clusterId>/<9009>/flow/index.html

Complete Error Message:

HTTP ERROR 500 java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;</h2>

URI: | /sparkling-water/
java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;
org.apache.spark.ui.JettyUtils$anon$1-3148410
java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;

Caused by:
java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;
    at org.apache.spark.h2o.ui.SparklingWaterInfoPage.render(SparklingWaterInfoPage.scala:62)
    at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:106)
    at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:81)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
    at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
    at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.lang.Thread.run(Thread.java:750)

ChuliangXiao commented 1 year ago

Having the same issue on with 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)

H2O_cluster_version:        3.38.0.4
Python_version:             3.8.10 final

margheritaleonelli commented 7 months ago

I am facing the same issue with Databricks cluster 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12), using H2O cluster version: 3.44.0.3 Is there any update on this problem?

eldhosepaul7, could you please tell me how the direct Flow link is built? https:///driver-proxy/o///<9009>/flow/index.html

thanks!

krasinski commented 7 months ago

it seems databricks has some custom spark version (not a surprise) and that method which we see in the error has a different signature

thomasjohnflaherty commented 5 months ago

I'm running Databricks on GCP and am able to construct the URL with the following python script:

import pandas as pd

x = "spark.databricks.clusterUsageTags."
org = spark.conf.get(x + "orgId")
clst = spark.conf.get(x + "clusterId")

flow = "https://" + org + "." + org[-1] + ".gcp.databricks.com/driver-proxy/o/" + org + "/" + clst + "/54321/flow/index.html"

dic = [{"Cluster" : spark.conf.get(x + "clusterName"), "URL" : flow}]

df = pd.DataFrame(dic)

def make_clickable(val):
    return '<a target="_blank" href="{}">{}</a>'.format(val, val)

df.style.format({'URL': make_clickable})

h2oai / sparkling-water

Azure Databricks: Sparkling Water UI #2870