juju-solutions / layer-apache-bigtop-base

Apache License 2.0
1 stars 5 forks source link

SPARK stuck in maintenance if benchmarks can't be installed #59

Closed erik78se closed 7 years ago

erik78se commented 7 years ago

What I try to do: I was trying to showcase a clean installation of hadoop-spark. Defaults on everything. We have a proxy, but thats taken care of on the model level as such (servernames replaced below):

juju add-model scania-hadoop-spark --config http-proxy=http://ourinternalproxyserver:8080 --config https-proxy=http://ourinternalproxyserver:8080 --config no-proxy=127.0.0.1

This is how I installed the bundle: juju deploy cs:bundle/hadoop-spark-37

Everything installs OK up until when it tries so start/install spark. Then it gets stuck.

SPARK ends up in a "maintenance" status

"juju status" gives...

spark/0* maintenance idle 4 10.54.83.143 configuring spark in yarn-client mode

I would expect it to fail into an "error" state here with some useful output telling me what went wrong.

I have concluded with help from "kjackal" in #juju at freenode IRC that its related to when its not possible to download (proxy!) the benchmarks for spark from AWS s3 buckets, the installation never reaches an error state, but instead gets stuck in "maintenance".

erik78se commented 7 years ago

NOTE:

I managed to get out of the error by setting the configuration:

juju config spark spark_bench_enabled=false

Then the installation completes and spark enters a "active" state. Even if this is a workaround, the stuck "maintenance" status gives very little information of what's gone wrong. At that point SPARK is not usable either which effectively leaves the installation broken.

ktsakalozos commented 7 years ago

Here are some logs from @erik78se

Deployment matrix of Spark: https://pastebin.com/bqWKGvZE Juju status: https://pastebin.com/DZey05k0 Stacktrace of the silenced error: https://pastebin.com/4RKMmBqW

kwmonroe commented 7 years ago

@erik78se Spark-Bench is now optional and no longer installed by default. This went in with:

https://issues.apache.org/jira/browse/BIGTOP-2834

spark-51 and above contain this fix:

https://jujucharms.com/spark/

Please let me know if you see any other issues related to spark deployment in a restricted network. Thanks!