Closed danking closed 4 years ago
Is there a possibility of adding an exception for users who need to pass a SparkContext to hl.init
?
In our use case, our notebooks use the sparkmagic kernel to communicate with livy running on multiple clusters. Sparkmagic automatically creates a SparkContext when connecting to livy on the cluster master node (AWS EMR). And in releases prior to 0.2.20 we were passing that SparkContext to hl.init
.
In our case, the JAR is in our class path.
it's already there!
hl.init(sc=sc)
ah, I understand
essentially you need to add the following as configuration in Spark:
HAIL_JAR_LOCATION=/path/to/python/site-packages/hail/hail-all-spark.jar
spark.jars=${HAIL_JAR_LOCATION}
spark.driver.extraClassPath=${HAIL_JAR_LOCATION}
spark.executor.extraClassPath=./hail-all-spark.jar
Thanks for the quick help! I've just build a new image from master. I'll test and report back.
I've validated our setup has those requirements and we're just hitting a FatalError from a commit a few weeks ago.
If I revert this file we're able to pass in the existing SparkContext with the expected hl.init(sc=sc)
.
As general feedback it may be better to warn here than force exit. I may be wrong but I don't see a way around this for people using remote notebooks to talk to Spark via Livy.
oh, whoops, we can definitely make that a warning.
Tested locally via Livy and my notebook is happy. I really appreciate your help here!
done
When Hail is pip-installed, you cannot use it with an already constructed SparkContext, unless said SparkContext's class path includes the hail JAR file. It is somewhat annoying to find the hail JAR location and add that to the class path.
The primary reason users want to pass an already constructed SparkContext is to specify some configuration parameters.
hl.init
should take aconf
as either aSparkConf
or adict
.