Pyspark module not found: what am I doing wrong?

Hi All, I would like to ask a question regarding the pyton support of a SparkApplication: I am trying to run a sparkapplication where i mount the pyspark python file from a configmap to the Driver and Executor pod. It goes fine but when the spark submit happens the pyspark file fails with the following error message:

File "/opt/spark/examples/src/main/python/pyspark.py", line 4, in <module>
    from pyspark.sql import SparkSession
  File "/opt/spark/examples/src/main/python/..2022_09_23_07_11_05.3779784609/pyspark.py", line 4, in <module>
    from pyspark.sql import SparkSession

ModuleNotFoundError: No module named 'pyspark.sql'; 'pyspark' is not a package I believe this shouldn’t happen because pyspark is internal part of the base image. Also if i run the python example (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/examples/spark-py-pi.yaml) which also use the same import(from pyspark.sql import SparkSession) but this one gets executed flawlessly. Could you please let me know what i am doing wrong? My SparkApplication manifest:

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: pyspark-configmap
  namespace: spark
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: "gcr.io/spark-operator/spark-py:v3.1.1"
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pyspark.py
  sparkVersion: "3.1.1"
  volumes:
    - name: spark-file
      configMap:
        name: pyspark-read-stuff
        items:
          - key: pyspark.py
            path: pyspark.py
    - name: csv-file
      configMap:
        name: samplecsv
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark-apps
    volumeMounts:
      - name: spark-file
        mountPath: /opt/spark/examples/src/main/python
      - name: csv-file
        mountPath: /content
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark-apps
    volumeMounts:
      - name: spark-file
        mountPath: /opt/spark/examples/src/main/python/pyspark.py
      - name: csv-file
        mountPath: /content

My pyspark.py file:

import sys
from random import random
from operator import add
from pyspark.sql import SparkSession

if __name__ == "__main__":
     spark = SparkSession.builder.appName('Read CSV File into DataFrame').getOrCreate()
     df = spark.read.csv('/content/sample.csv', sep=',',inferSchema=True, header=True)
     df.head()

kubeflow / spark-operator

Pyspark module not found: what am I doing wrong? #1614