Closed camper42 closed 3 days ago
Not sure what you are asking for here. Can you elaborate what feature/fix/change you are thinking of?
spark-operator:
spark-application (use serucityContext.runAsUser: 3631
& mount /etc/password
):
Problem:
SPARK_USER=5089
, due to apache/spark@4b3fe3aSPARK_USER=5089
, due to apache/spark@4b3fe3a5089
not camper42
as expected.(SPARK_USEER
priority higher than container uid)current fix our use: add below to driver pod spec
env:
- name: SPARK_USER
value: camper42
and driver pod will has two env named SPARK_USER
like:
env:
- name: SPARK_USER
value: "5089"
- name: SPARK_APPLICATION_ID
value: spark-44baec237ea2422da98eb877bcaeb82d
- name: SPARK_DRIVER_BIND_ADDRESS
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: PYSPARK_MAJOR_PYTHON_VERSION
value: "3"
- name: SPARK_LOCAL_DIRS
value: /var/data/spark-ce8f601d-07b3-42db-a0d9-9f9fc04b3b03
- name: SPARK_CONF_DIR
value: /opt/spark/conf
- name: HADOOP_CONF_DIR
value: /etc/hadoop/conf
- name: SPARK_USER
value: camper42
Not sure what you are asking for here. Can you elaborate what feature/fix/change you are thinking of?
maybe add .spec.sparkUser
to CRD?
and spark-operator run submit like: env SPARK_USER=<.spec.sparkUser> spark-submit
to set correct SPARK_USER environment variable in application
I'm not sure it will be considered as a bug or a missing feature
what I want here:
var command = filepath.Join(sparkHome, "/bin/spark-submit")
cmd := execCommand(command, submission.args...)
cmd.Env = append(cmd.Env, "SPARK_USRE=SOME_USER") // SOME_USER maybe from .spec.sparkUser
glog.V(2).Info("spark-submit arguments: %v", cmd.Args)
output, err := cmd.Output()
As I mentioned, SPARK_USER
has higher priority than UID, and will be used in IO with HDFS.
start spark-3.0, spark-submit will set driver pod SPARK_USER ENV.
IfSPARK_USER
in Operator and Spark Application point to different user, application will use wrong SPARK_USER to execute and IO with HDFS may have permission issue.
@liyinan926 Did I make myself clear? and If you think this is indeed a problem, I'm happy to create a PR
I'm also having similar problem with mounted configmap directory ownership. SPARK_USER should be consistent with what's specified in the .securityContext in sparkapplication.
got same problem
got same problem
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.
https://github.com/apache/spark/commit/4b3fe3a9ccc8a4a8eb0d037d19cb07a8a288e37a start spark-3.0, driver & executor will configure with
SPARK_USER
envin
org.apache.spark.util.Utils#getCurrentUserName
,SPARK_USER
env has higher priotiry thanrunAsUser
and spark will use this to interact with Hadoop
org.apache.spark.deploy.SparkHadoopUtil#createSparkUser
our problem:
spark_uid=5089
runAsUser=3631
org.apache.hadoop.security.AccessControlException: Permission denied: user=5089, access=WRITE, inode="/user/camper42/streaming_checkpoint/query2":camper42:supergroup:drwxr-xr-x
(user camper42's uid is 3631)