kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.75k stars 1.36k forks source link

set correct SPARK_USER ENV when submit sparkapplications #931

Open camper42 opened 4 years ago

camper42 commented 4 years ago

https://github.com/apache/spark/commit/4b3fe3a9ccc8a4a8eb0d037d19cb07a8a288e37a start spark-3.0, driver & executor will configure with SPARK_USER env

Finally, the new code always sets SPARK_USER in the driver and executor pods. This is in line with how other resource managers behave: the submitting user reflects which user will access Hadoop services in the app. (With kerberos, that's overridden by the logged in user.) That user is unrelated to the OS user the app is running as inside the containers.

in org.apache.spark.util.Utils#getCurrentUserName, SPARK_USER env has higher priotiry than runAsUser

and spark will use this to interact with Hadoop org.apache.spark.deploy.SparkHadoopUtil#createSparkUser


our problem:

  1. all image build with spark_uid=5089
  2. a stream application runAsUser=3631
  3. task failed due to org.apache.hadoop.security.AccessControlException: Permission denied: user=5089, access=WRITE, inode="/user/camper42/streaming_checkpoint/query2":camper42:supergroup:drwxr-xr-x (user camper42's uid is 3631)
liyinan926 commented 4 years ago

Not sure what you are asking for here. Can you elaborate what feature/fix/change you are thinking of?

camper42 commented 4 years ago

spark-operator:

spark-application (use serucityContext.runAsUser: 3631 & mount /etc/password):


Problem:

  1. spark-operator submit app (spark-submit process uid 5089)
  2. driver pod start, uid 3631, username camper42, but has environment variable SPARK_USER=5089, due to apache/spark@4b3fe3a
  3. executor pod start, uid 3631, username campee42, but has environment variable SPARK_USER=5089, due to apache/spark@4b3fe3a
  4. executor write checkpoint to HDFS with username 5089 not camper42 as expected.(SPARK_USEER priority higher than container uid)
  5. permissioin denied, task failed.

current fix our use: add below to driver pod spec

env:
  - name: SPARK_USER
    value: camper42

and driver pod will has two env named SPARK_USER like:

    env:
    - name: SPARK_USER
      value: "5089"
    - name: SPARK_APPLICATION_ID
      value: spark-44baec237ea2422da98eb877bcaeb82d
    - name: SPARK_DRIVER_BIND_ADDRESS
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: PYSPARK_MAJOR_PYTHON_VERSION
      value: "3"
    - name: SPARK_LOCAL_DIRS
      value: /var/data/spark-ce8f601d-07b3-42db-a0d9-9f9fc04b3b03
    - name: SPARK_CONF_DIR
      value: /opt/spark/conf
    - name: HADOOP_CONF_DIR
      value: /etc/hadoop/conf
    - name: SPARK_USER
      value: camper42

Not sure what you are asking for here. Can you elaborate what feature/fix/change you are thinking of?

maybe add .spec.sparkUser to CRD? and spark-operator run submit like: env SPARK_USER=<.spec.sparkUser> spark-submit to set correct SPARK_USER environment variable in application

I'm not sure it will be considered as a bug or a missing feature

camper42 commented 4 years ago

https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/pkg/controller/sparkapplication/submission.go#L62-L66

what I want here:

    var command = filepath.Join(sparkHome, "/bin/spark-submit")

    cmd := execCommand(command, submission.args...)
        cmd.Env = append(cmd.Env, "SPARK_USRE=SOME_USER")  // SOME_USER maybe from .spec.sparkUser
    glog.V(2).Info("spark-submit arguments: %v", cmd.Args)
    output, err := cmd.Output()

As I mentioned, SPARK_USER has higher priority than UID, and will be used in IO with HDFS.

start spark-3.0, spark-submit will set driver pod SPARK_USER ENV. IfSPARK_USER in Operator and Spark Application point to different user, application will use wrong SPARK_USER to execute and IO with HDFS may have permission issue.

@liyinan926 Did I make myself clear? and If you think this is indeed a problem, I'm happy to create a PR

FloraZhang commented 3 years ago

I'm also having similar problem with mounted configmap directory ownership. SPARK_USER should be consistent with what's specified in the .securityContext in sparkapplication.

erictanghu commented 3 years ago

got same problem

mggger commented 3 years ago

got same problem