Open camper42 opened 4 years ago
Not sure what you are asking for here. Can you elaborate what feature/fix/change you are thinking of?
spark-operator:
spark-application (use serucityContext.runAsUser: 3631
& mount /etc/password
):
Problem:
SPARK_USER=5089
, due to apache/spark@4b3fe3aSPARK_USER=5089
, due to apache/spark@4b3fe3a5089
not camper42
as expected.(SPARK_USEER
priority higher than container uid)current fix our use: add below to driver pod spec
env:
- name: SPARK_USER
value: camper42
and driver pod will has two env named SPARK_USER
like:
env:
- name: SPARK_USER
value: "5089"
- name: SPARK_APPLICATION_ID
value: spark-44baec237ea2422da98eb877bcaeb82d
- name: SPARK_DRIVER_BIND_ADDRESS
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: PYSPARK_MAJOR_PYTHON_VERSION
value: "3"
- name: SPARK_LOCAL_DIRS
value: /var/data/spark-ce8f601d-07b3-42db-a0d9-9f9fc04b3b03
- name: SPARK_CONF_DIR
value: /opt/spark/conf
- name: HADOOP_CONF_DIR
value: /etc/hadoop/conf
- name: SPARK_USER
value: camper42
Not sure what you are asking for here. Can you elaborate what feature/fix/change you are thinking of?
maybe add .spec.sparkUser
to CRD?
and spark-operator run submit like: env SPARK_USER=<.spec.sparkUser> spark-submit
to set correct SPARK_USER environment variable in application
I'm not sure it will be considered as a bug or a missing feature
what I want here:
var command = filepath.Join(sparkHome, "/bin/spark-submit")
cmd := execCommand(command, submission.args...)
cmd.Env = append(cmd.Env, "SPARK_USRE=SOME_USER") // SOME_USER maybe from .spec.sparkUser
glog.V(2).Info("spark-submit arguments: %v", cmd.Args)
output, err := cmd.Output()
As I mentioned, SPARK_USER
has higher priority than UID, and will be used in IO with HDFS.
start spark-3.0, spark-submit will set driver pod SPARK_USER ENV.
IfSPARK_USER
in Operator and Spark Application point to different user, application will use wrong SPARK_USER to execute and IO with HDFS may have permission issue.
@liyinan926 Did I make myself clear? and If you think this is indeed a problem, I'm happy to create a PR
I'm also having similar problem with mounted configmap directory ownership. SPARK_USER should be consistent with what's specified in the .securityContext in sparkapplication.
got same problem
got same problem
https://github.com/apache/spark/commit/4b3fe3a9ccc8a4a8eb0d037d19cb07a8a288e37a start spark-3.0, driver & executor will configure with
SPARK_USER
envin
org.apache.spark.util.Utils#getCurrentUserName
,SPARK_USER
env has higher priotiry thanrunAsUser
and spark will use this to interact with Hadoop
org.apache.spark.deploy.SparkHadoopUtil#createSparkUser
our problem:
spark_uid=5089
runAsUser=3631
org.apache.hadoop.security.AccessControlException: Permission denied: user=5089, access=WRITE, inode="/user/camper42/streaming_checkpoint/query2":camper42:supergroup:drwxr-xr-x
(user camper42's uid is 3631)