Open javierluraschi opened 6 years ago
Note: Reported through support that when TLS is enforced, sparklyr
fails to connect with TLS enabled.
sparklyr: Session (####) is starting under #### port #### Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef; at sparklyr.Utils$.portIsAvailable(utils.scala:435) at sparklyr.Backend.init(backend.scala:158)
Related: https://mapr.com/docs/52/Spark/ConfigureSparkOnYarn_Encryption.html
Draft to support high-performance socket connections in Livy ( https://github.com/rstudio/sparklyr/issues/1579) and also other job-spawning technologies.
Overview: Livy has been really popular in the
sparklyr
community, mostly out of necessity since it enables connectivity for tightly managed clusters or clusters that ran out of capacity to install additional boundary machines were RStudio could be installed, etc. While Livy has proved to be a good solution for connectivity, it's performance is less than ideal since it's primary purpose is to provide a remote REPL.Solution: One solution would be to request system administrators to make a few additional ports available for
sparklyr
to upgrade the HTTP connection into a high-performance binary connectionsparklyr
already supports. However, some additional work is required to provide proper secure access and to the potentially internet visible ports.Work Items:
sparklyr
backend in Livy, this is already supported when usingspark_apply()
with Livy but not currently initialized in the driver node.Notes: This port upgrade would be supported in Amazon AWS since a security group can be defined; however, other platforms like Azure HDInsight does not seem to support port configuration.