h2oai / sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster
https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html
Apache License 2.0
968 stars 360 forks source link

Azure Databricks: Sparkling Water UI #2870

Open eldhosepaul7 opened 1 year ago

eldhosepaul7 commented 1 year ago

For Azure databricks, the Sparkling water flow ui gives an error HTTP ERROR 500 java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq; when its launched from spark ui tab.

Once we run the below command it creates H2OContext and to access the web ui, it gives instructions to Open H2O Flow in browser: Go to Spark UI > Sparkling Water tab > click Flow UI link

from pysparkling import *
hc = H2OContext.getOrCreate()

Please also provide us with the full and minimal reproducible code.

How to Reproduce:

Behavior is same for DBR 11.3 LTS and 12.2 LTS , Spark version 3.3.0 and Pysparking version h2o-pysparkling-3.3

Only scenario it works is DBR 9.1 LTS , Spark Version 3.1.2 and Pysparking version h2o-pysparkling-3.1, there might be some breaking changes introduced in version 3.2 and 3.3.

Note: we can access directly access from the link without going though the spark ui -> sparkling ui: https://<adb-link>/driver-proxy/o/<orgId>/<clusterId>/<9009>/flow/index.html

Complete Error Message:

HTTP ERROR 500 java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;</h2>

URI: | /sparkling-water/
java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;
org.apache.spark.ui.JettyUtils$anon$1-3148410
java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;

Caused by:
java.lang.NoSuchMethodError: org.apache.spark.ui.UIUtils$.listingTable(Lscala/collection/Seq;Lscala/Function1;Lscala/collection/Iterable;ZLscala/Option;Lscala/collection/Seq;ZZLscala/collection/Seq;)Lscala/collection/Seq;
    at org.apache.spark.h2o.ui.SparklingWaterInfoPage.render(SparklingWaterInfoPage.scala:62)
    at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:106)
    at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:81)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
    at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1626)
    at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
    at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.lang.Thread.run(Thread.java:750)
image
ChuliangXiao commented 1 year ago

Having the same issue on with 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12)

H2O_cluster_version:        3.38.0.4
Python_version:             3.8.10 final
margheritaleonelli commented 7 months ago

I am facing the same issue with Databricks cluster 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12), using H2O cluster version: 3.44.0.3 Is there any update on this problem?

eldhosepaul7, could you please tell me how the direct Flow link is built? https:///driver-proxy/o///<9009>/flow/index.html

thanks!

krasinski commented 7 months ago

it seems databricks has some custom spark version (not a surprise) and that method which we see in the error has a different signature

thomasjohnflaherty commented 5 months ago

I'm running Databricks on GCP and am able to construct the URL with the following python script:

import pandas as pd

x = "spark.databricks.clusterUsageTags."
org = spark.conf.get(x + "orgId")
clst = spark.conf.get(x + "clusterId")

flow = "https://" + org + "." + org[-1] + ".gcp.databricks.com/driver-proxy/o/" + org + "/" + clst + "/54321/flow/index.html"

dic = [{"Cluster" : spark.conf.get(x + "clusterName"), "URL" : flow}]

df = pd.DataFrame(dic)

def make_clickable(val):
    return '<a target="_blank" href="{}">{}</a>'.format(val, val)

df.style.format({'URL': make_clickable})