testdrivenio / spark-kubernetes

spark on kubernetes
105 stars 75 forks source link

[Question] Connecting to spark in Client Mode on k8s #3

Closed jaskiratr closed 4 years ago

jaskiratr commented 5 years ago

Hello, Thank you for the excellent work. Is there a way to use this architecture for executing pyspark scripts in Client Mode. Such that one can import pyspark in a Jupyter Notebook and connect to the spark cluster running on kubernetes.

pyspark-notebook λ docker run --rm -p 10000:8888 -e JUPYTER_ENABLE_LAB=yes -v "$(PWD):/home/jovyan/work" jupyter/pyspark-notebook

Something like this?

# my-notebook.ipynb
import os
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

# Build Spark session
sparkConf = SparkConf()
sparkConf.setMaster("k8s://https://localhost:6445")
sparkConf.set("spark.kubernetes.container.image", "spark-hadoop:2.2.1")
sparkConf.set("spark.kubernetes.namespace", "default")
sparkConf.set('spark.submit.deployMode', 'client') # Only client mode is possible 
sparkConf.set('spark.executor.instances', '2') # Set the number of executer pods
sparkConf.setAppName('pyspark-shell')
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'

spark = SparkSession.builder.config(conf=sparkConf).getOrCreate()
sc = spark.sparkContext

# Test
filePath = os.path.join('../Test1.csv')
df = spark.read.format('csv').options(
    header='true', inferSchema=True).load(filePath)
df.show()