Open behrica opened 3 years ago
I just saw that this is not a Delay any more.
So we get the SparkSession initialized once we require the name space. And I suppose, it cannot be changed any more.
So it cannot be re-configured.
I see... I think we can change it back to a delay. Would you like to make a PR for that?
Maybe there is a better way.
Maybe the default "configuration map" for the session https://github.com/behrica/geni/blob/482c4b934f037d32b849916211b509c94d89800e/src/clojure/zero_one/geni/defaults.clj#L5
Could become an "atom" , which can be changed if needed before requiring the default name space 'zero-one.geni.core'
I think that the current feature to potentially change the session itself is not super usefull, because Spark does not really support this cleanly, correct ? If I have read it right, the spark session is meant to be instantiated ones in the lifetime of a JVM. I can try this out to see if it works.
I think it could work this way.
The issue would be to keep the "full automatic" session configuration of the geni-cli. My opinion is, that the current way of the geni cli session initialization, which:
is brittle as it will not work always and depends on "order" of requiring ns / using functions.
I think we have three options for this:
Not have it full automatic, but a methods which needs to be called (init-default-spark) or similar -> this could then allow changing config settings
Allow to change spark session configuration from outside repl by either:
I still like the overall idea of the geni CLI as a quick user friendly entry point, but it needs to allow arbitrary session configs. (or we do not allow any custom session config for the geni cli, and see it as a "demo") The other spark shells can be fully configured from command line (and do neither allow to change session from inside)
Here is a link to our previous discussions for reference
I think that the current feature to potentially change the session itself is not super usefull, because Spark does not really support this cleanly, correct ? If I have read it right, the spark session is meant to be instantiated ones in the lifetime of a JVM.
You are correct. Typically, a user's spark session settings would be set during the call to spark-submit
. The default session settings in geni will only be applied if no call to spark-submit
is made (ie. running locally).
Most Spark usage (across all languages) happens by launching a spark "application" (for example, a Geni REPL) on an existing spark cluster. It is not expected that the spark application creates it's own cluster, and thus the session config is supplied when the .jar
and main class are specified.
I'm not too familiar with Kubernetes, so I am having trouble following the guide. It looks like the Geni CLI is being started outside of spark-submit
. I think the more traditional pattern would be to call spark-submit
in the container for the cluster's driver and pass an uberjar of Geni and --class zero-one.geni.main
along with any other spark session config you want.
I have had success with starting Geni REPLs on flintrock clusters using spark-submit
.
Following the minikube guide: https://github.com/zero-one-group/geni/blame/develop/docs/kubernetes_basic.md
the verification of line 118 fails.
It seems that I cannot change the spark-session, by calling
g/create-spark-session
I am pretty sure, that it worked at one moment.