Open nonpool opened 3 years ago
There is an online plan: https://github.com/datamechanics/delight
There is an online plan: https://github.com/datamechanics/delight
Thanks for your solution. This is a great spark history visualization solution.(The visual style is very modern and the configuration is simple and easy to use) But its limitations are also obvious. The visualization page can only be used online, and cannot be deployed by ourselves, so it is not suitable for our scenario.
You could write your own deployment and run history server using ./sbin/start-history-server.sh, I have located an alternative hosting of the charts at https://artifacthub.io/packages/helm/spot/spark-history-server which might help.
I am writing spark events to s3, so I build a new docker container adding a couple jars and just change the entry point to run spark history server.
FROM gcr.io/spark-operator/spark:v3.1.1-hadoop3
USER root
ADD https://xxx.com/artifactory/apixio-spark/org/apache/hadoop/hadoop-aws/2.7.4/hadoop-aws-2.7.4.jar $SPARK_HOME/jars/
ADD https://xxx.com/artifactory/apixio-spark/com/amazonaws/aws-java-sdk-bundle/1.7.4.2/aws-java-sdk-1.7.4.2.jar $SPARK_HOME/jars/
ENTRYPOINT bash /opt/spark/sbin/spark-daemon.sh start org.apache.spark.deploy.history.HistoryServer 1
Then just a deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: spark-hs-custom
version: 3.1.1
name: spark-hs-custom
spec:
replicas: 1
selector:
matchLabels:
app: spark-hs-custom
version: 3.1.1
template:
metadata:
labels:
app: spark-hs-custom
version: 3.1.1
spec:
containers:
- env:
- name: SPARK_NO_DAEMONIZE
value: "false"
- name: SPARK_HISTORY_OPTS
value: -Dspark.history.fs.logDirectory=s3a://my-bucket-name/eventLogFolder
image: xxx.dkr.ecr.us-west-2.amazonaws.com/xxx-spark-hs:v0.0.4
imagePullPolicy: IfNotPresent
name: spark-hs-custom
ports:
- containerPort: 18080
name: http
protocol: TCP
resources:
requests:
cpu: "2"
memory: 10Gi
limits:
cpu: "2"
memory: 10Gi
@jdonnelly-apixio : I tried your solution (with a USER different than root) and I get this error in the logs of the spark history server
Exception in thread "main" java.io.IOException: failure to login at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
I think that a user must be created in the docker image
@stephbat Yea, I think I hit that issue as well when I was running as someone other than root. Google's base image uses the user 185
, but I wasn't able to get it to work with that one. Can you post what you did if you figure out a solution?
yea, confirmed I get that exception when I try to do a USER 185
21/07/13 22:00:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.hadoop.security.KerberosAuthException: failure to login: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
at jdk.security.auth/com.sun.security.auth.UnixPrincipal.<init>(Unknown Source)
at jdk.security.auth/com.sun.security.auth.module.UnixLoginModule.login(Unknown Source)
at java.base/javax.security.auth.login.LoginContext.invoke(Unknown Source)
at java.base/javax.security.auth.login.LoginContext$4.run(Unknown Source)
at java.base/javax.security.auth.login.LoginContext$4.run(Unknown Source)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.login.LoginContext.invokePriv(Unknown Source)
at java.base/javax.security.auth.login.LoginContext.login(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:1926)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1837)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2476)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2476)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.history.HistoryServer$.createSecurityManager(HistoryServer.scala:333)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:294)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1847)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2476)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2476)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.history.HistoryServer$.createSecurityManager(HistoryServer.scala:333)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:294)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name
at jdk.security.auth/com.sun.security.auth.UnixPrincipal.<init>(Unknown Source)
at jdk.security.auth/com.sun.security.auth.module.UnixLoginModule.login(Unknown Source)
at java.base/javax.security.auth.login.LoginContext.invoke(Unknown Source)
at java.base/javax.security.auth.login.LoginContext$4.run(Unknown Source)
at java.base/javax.security.auth.login.LoginContext$4.run(Unknown Source)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.login.LoginContext.invokePriv(Unknown Source)
at java.base/javax.security.auth.login.LoginContext.login(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:1926)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1837)
at org.apache.hadoop.security.UserGroupInformation.createLoginUser(UserGroupInformation.java:710)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:660)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:571)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2476)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2476)
at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
at org.apache.spark.deploy.history.HistoryServer$.createSecurityManager(HistoryServer.scala:333)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:294)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
at java.base/javax.security.auth.login.LoginContext.invoke(Unknown Source)
at java.base/javax.security.auth.login.LoginContext$4.run(Unknown Source)
at java.base/javax.security.auth.login.LoginContext$4.run(Unknown Source)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.login.LoginContext.invokePriv(Unknown Source)
at java.base/javax.security.auth.login.LoginContext.login(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:1926)
at org.apache.hadoop.security.UserGroupInformation.doSubjectLogin(UserGroupInformation.java:1837)
... 10 more
This works for me:
FROM gcr.io/spark-operator/spark:v3.1.1-hadoop3
USER root
ADD https://xxx.com/artifactory/apixio-spark/org/apache/hadoop/hadoop-aws/2.7.4/hadoop-aws-2.7.4.jar $SPARK_HOME/jars/
ADD https://xxx.com/artifactory/apixio-spark/com/amazonaws/aws-java-sdk-bundle/1.7.4.2/aws-java-sdk-1.7.4.2.jar $SPARK_HOME/jars/
RUN groupadd -g 185 spark && \
useradd -u 185 -g 185 spark
USER 185
ENTRYPOINT bash /opt/spark/sbin/spark-daemon.sh start org.apache.spark.deploy.history.HistoryServer 1
@jdonnelly-apixio @indranilr thank for your solution.
In fact, I also use deployment to deploy Spark history server
just like yours.
But what I want to say is that we can deploy Spark history server
ourselves is completely different from being included in this helm repo.
I think we should simply change the vaules.yaml
to make his work well.
@nonpool yep, kind of agreed. it would be useful if the spark-operator supported the deployment of common services like spark history server, a hive metastore, prometheus server, etc.
can a helm chart install other helm charts (if not, not sure it would make sense to duplicate helm install functionality for stuff like prometheus server inside of the spark-operator helm chart and the official prometheus-community helm chart should probably be used instead)? not sure what best practices would be from a helm stand point... maybe just some additional documentation that shows or links to how to install some common useful additional services would be a good start
Hi @stephbat @indranilr @nonpool @jdonnelly-apixio @haolixu I am as of now confused how to install Spark History Server on kubernetes after successful installation of the Spark Operator on Kubernetes?
@chetkhatri Here is my personal experience:
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v3.1.1-hadoop3
FROM ${SPARK_IMAGE}
USER root
RUN rm $SPARK_HOME/jars/guava-*.jar ADD https://repo1.maven.org/maven2/com/google/guava/guava/27.0-jre/guava-27.0-jre.jar $SPARK_HOME/jars RUN chmod 644 $SPARK_HOME/jars/guava-27.0-jre.jar
ADD https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop3.jar $SPARK_HOME/jars RUN chmod 644 $SPARK_HOME/jars/gcs-connector-latest-hadoop3.jar
USER ${spark_uid}
ENTRYPOINT ${SPARK_HOME}/sbin/start-history-server.sh
I've added some stuff related to GCP. Feel free to add your own for AWS/Azure whatever...
2. Create a new chart with `helm create`
3. Change the following in `templates/deployment.yaml`:
- `spec.template.spec.containers[0].ports.containerPort` -> 18080
- `spec.template.spec.containers[0].env` ->
env:
4. Add `config` section to `values.yaml`
config: sparkConfDirectory: "/opt/spark/conf" historyLogDirectory: ""
5. Fill `values.yaml` sections according to your needs (services, ingresses etc...)
6. Install your chart
7. Done
I hope I didn't forget anything
This works for me:
FROM gcr.io/spark-operator/spark:v3.1.1-hadoop3 USER root ADD https://xxx.com/artifactory/apixio-spark/org/apache/hadoop/hadoop-aws/2.7.4/hadoop-aws-2.7.4.jar $SPARK_HOME/jars/ ADD https://xxx.com/artifactory/apixio-spark/com/amazonaws/aws-java-sdk-bundle/1.7.4.2/aws-java-sdk-1.7.4.2.jar $SPARK_HOME/jars/ RUN groupadd -g 185 spark && \ useradd -u 185 -g 185 spark USER 185 ENTRYPOINT bash /opt/spark/sbin/spark-daemon.sh start org.apache.spark.deploy.history.HistoryServer 1
awesome,this works, but i have no idea how does it works
Hi everyone, I've one question: are you also able to collect driver/executor stdout/stderr logs that you can see through kubectl logs?
Has anyone tried to do this more recently? I'm trying with the apache/spark:3.5.0 container image and have lost a day to dependency hell. A Spark History helm chart in this repo or documentation on how to get this up and running would be very welcome indeed.
What is your securityContext
set to on your deployment for the history server? In my case, I had the same issue as @jdonnelly-apixio here, but the real issue was that my securityContext was set to run as a different user.
The fix, in my case, was simply:
securityContext:
runAsUser: 185
As previously it was set to run as user 1000
. This fixed the issue for me.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bump to keep alive as this would be useful.
background: [#164] issue
reason:
Spark history server
has not a stable available chart since helm/chart repo archivedspark-on-k8s-operator
, there is a very high probability thatSpark history server
is needed, because the webUI of spark diver will be inaccessible after the spark executor is completed.Of course, if you have a better spark history visualization solution, you can also provide it. I really did not find this information in the document.
What do you think?