sparklyr / sparklyr

R interface for Apache Spark
https://spark.rstudio.com/
Apache License 2.0
957 stars 310 forks source link

Secure Sockets #1595

Open javierluraschi opened 6 years ago

javierluraschi commented 6 years ago

Draft to support high-performance socket connections in Livy ( https://github.com/rstudio/sparklyr/issues/1579) and also other job-spawning technologies.

Overview: Livy has been really popular in the sparklyr community, mostly out of necessity since it enables connectivity for tightly managed clusters or clusters that ran out of capacity to install additional boundary machines were RStudio could be installed, etc. While Livy has proved to be a good solution for connectivity, it's performance is less than ideal since it's primary purpose is to provide a remote REPL.

Solution: One solution would be to request system administrators to make a few additional ports available for sparklyr to upgrade the HTTP connection into a high-performance binary connection sparklyr already supports. However, some additional work is required to provide proper secure access and to the potentially internet visible ports.

Work Items:

Notes: This port upgrade would be supported in Amazon AWS since a security group can be defined; however, other platforms like Azure HDInsight does not seem to support port configuration.

javierluraschi commented 5 years ago

Note: Reported through support that when TLS is enforced, sparklyr fails to connect with TLS enabled.

sparklyr: Session (####) is starting under #### port #### Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef; at sparklyr.Utils$.portIsAvailable(utils.scala:435) at sparklyr.Backend.init(backend.scala:158)

Related: https://mapr.com/docs/52/Spark/ConfigureSparkOnYarn_Encryption.html