Hi All, I would like to ask a question regarding the pyton support of a SparkApplication: I am trying to run a sparkapplication where i mount the pyspark python file from a configmap to the Driver and Executor pod. It goes fine but when the spark submit happens the pyspark file fails with the following error message:
File "/opt/spark/examples/src/main/python/pyspark.py", line 4, in <module>
from pyspark.sql import SparkSession
File "/opt/spark/examples/src/main/python/..2022_09_23_07_11_05.3779784609/pyspark.py", line 4, in <module>
from pyspark.sql import SparkSession
ModuleNotFoundError: No module named 'pyspark.sql'; 'pyspark' is not a package
I believe this shouldn’t happen because pyspark is internal part of the base image. Also if i run the python example (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/examples/spark-py-pi.yaml) which also use the same import(from pyspark.sql import SparkSession) but this one gets executed flawlessly. Could you please let me know what i am doing wrong?
My SparkApplication manifest:
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
if __name__ == "__main__":
spark = SparkSession.builder.appName('Read CSV File into DataFrame').getOrCreate()
df = spark.read.csv('/content/sample.csv', sep=',',inferSchema=True, header=True)
df.head()
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi All, I would like to ask a question regarding the pyton support of a SparkApplication: I am trying to run a sparkapplication where i mount the pyspark python file from a configmap to the Driver and Executor pod. It goes fine but when the spark submit happens the pyspark file fails with the following error message:
ModuleNotFoundError: No module named 'pyspark.sql'; 'pyspark' is not a package I believe this shouldn’t happen because pyspark is internal part of the base image. Also if i run the python example (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/examples/spark-py-pi.yaml) which also use the same import(from pyspark.sql import SparkSession) but this one gets executed flawlessly. Could you please let me know what i am doing wrong? My SparkApplication manifest:
My pyspark.py file: