Closed jaskiratr closed 4 years ago
Hello, Thank you for the excellent work. Is there a way to use this architecture for executing pyspark scripts in Client Mode. Such that one can import pyspark in a Jupyter Notebook and connect to the spark cluster running on kubernetes.
import pyspark
pyspark-notebook λ docker run --rm -p 10000:8888 -e JUPYTER_ENABLE_LAB=yes -v "$(PWD):/home/jovyan/work" jupyter/pyspark-notebook
λ docker run --rm -p 10000:8888 -e JUPYTER_ENABLE_LAB=yes -v "$(PWD):/home/jovyan/work" jupyter/pyspark-notebook
Something like this?
# my-notebook.ipynb import os from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession # Build Spark session sparkConf = SparkConf() sparkConf.setMaster("k8s://https://localhost:6445") sparkConf.set("spark.kubernetes.container.image", "spark-hadoop:2.2.1") sparkConf.set("spark.kubernetes.namespace", "default") sparkConf.set('spark.submit.deployMode', 'client') # Only client mode is possible sparkConf.set('spark.executor.instances', '2') # Set the number of executer pods sparkConf.setAppName('pyspark-shell') os.environ['PYSPARK_PYTHON'] = 'python3' os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3' spark = SparkSession.builder.config(conf=sparkConf).getOrCreate() sc = spark.sparkContext # Test filePath = os.path.join('../Test1.csv') df = spark.read.format('csv').options( header='true', inferSchema=True).load(filePath) df.show()
Hello, Thank you for the excellent work. Is there a way to use this architecture for executing pyspark scripts in Client Mode. Such that one can
import pyspark
in a Jupyter Notebook and connect to the spark cluster running on kubernetes.pyspark-notebook
λ docker run --rm -p 10000:8888 -e JUPYTER_ENABLE_LAB=yes -v "$(PWD):/home/jovyan/work" jupyter/pyspark-notebook
Something like this?