Closed ryancampbell closed 6 years ago
Edit spark-on-k8s-operator/Dockerfile second FROM to be: FROM gcr.io/uncoil-io/spark:v2.3.1
Should you be using FROM gcr.io/uncoil-io/spark:v2.3.1-gcs
instead of FROM gcr.io/uncoil-io/spark:v2.3.1
in your spark-on-k8s-operator/Dockerfile?
What you use spark:v2.3.1-gcs
for? Is there any application dependency to be downloaded from GCS?
@liyinan926 I'll try that again, although I believe I tried both.
I used "gcr.io/uncoil-io/spark:v2.3.1" instead of "gcr.io/uncoil-io/spark:v2.3.1-gcs" because the original Dockerfile used "gcr.io/ynli-k8s/spark:v2.3.0" and not "gcr.io/ynli-k8s/spark:v2.3.0-gcs"
Can you post your SparkApplication
spec here?
apiVersion: "sparkoperator.k8s.io/v1alpha1"
kind: ScheduledSparkApplication
metadata:
name: uncoil-runner
spec:
schedule: "@every 5m"
concurrencyPolicy: Forbid
template:
type: Scala
mode: cluster
image: gcr.io/uncoil-io/spark:v2.3.1-gcs
mainClass: uncoil.UncoilRunner
mainApplicationFile: gs://uncoil-artifacts/uncoil-job/uncoil-job.jar
deps:
jars:
- http://central.maven.org/maven2/org/apache/commons/commons-pool2/2.5.0/commons-pool2-2.5.0.jar
- http://central.maven.org/maven2/redis/clients/jedis/2.9.0/jedis-2.9.0.jar
- http://dl.bintray.com/spark-packages/maven/RedisLabs/spark-redis/0.3.2/spark-redis-0.3.2.jar
- http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.10.0.1/kafka-clients-0.10.0.1.jar
- http://central.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.3.1/spark-sql-kafka-0-10_2.11-2.3.1.jar
- http://central.maven.org/maven2/org/apache/spark/spark-sql_2.11/2.3.1/spark-sql_2.11-2.3.1.jar
- http://central.maven.org/maven2/com/typesafe/config/1.3.2/config-1.3.2.jar
- http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar
imagePullPolicy: Always
hadoopConf:
"fs.gs.project.id": "uncoil-io"
"fs.gs.system.bucket": "uncoil-spark-production"
"google.cloud.auth.service.account.enable": "true"
"google.cloud.auth.service.account.json.keyfile": "/mnt/secrets/spark-sa.json"
driver:
cores: 1
memory: 3g
labels:
version: 2.3.1
serviceAccount: spark-sa
secrets:
- name: "spark-sa"
path: "/mnt/secrets"
secretType: GCPServiceAccount
envVars:
GCS_PROJECT_ID: uncoil-io
SCALA_ENV: production
executor:
instances: 2
cores: 2
memory: 10g
labels:
versions: 2.3.1
secrets:
- name: "spark-sa"
path: "/mnt/secrets"
secretType: GCPServiceAccount
envVars:
GCS_PROJECT_ID: uncoil-io
SCALA_ENV: production
Can you run kubectl logs -c spark-init <driver pod name>
? This will give you the init container logs. The main application file gs://uncoil-artifacts/uncoil-job/uncoil-job.jar
needs to be downloaded by the init container from GCS first before the driver starts running.
++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=init
+ '[' -z init ']'
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ grep SPARK_JAVA_OPT_
+ readarray -t SPARK_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-class" "org.apache.spark.deploy.k8s.SparkPodInitContainer" "$@")
+ exec /sbin/tini -s -- /opt/spark/bin/spark-class org.apache.spark.deploy.k8s.SparkPodInitContainer /etc/spark-init/spark-init.properties
2018-07-19 18:12:27 INFO SparkPodInitContainer:54 - Starting init-container to download Spark application dependencies.
2018-07-19 18:12:27 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-07-19 18:12:27 INFO SecurityManager:54 - Changing view acls to: root
2018-07-19 18:12:27 INFO SecurityManager:54 - Changing modify acls to: root
2018-07-19 18:12:27 INFO SecurityManager:54 - Changing view acls groups to:
2018-07-19 18:12:27 INFO SecurityManager:54 - Changing modify acls groups to:
2018-07-19 18:12:27 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2018-07-19 18:12:28 INFO SparkPodInitContainer:54 - Downloading remote jars: Some(http://central.maven.org/maven2/org/apache/commons/commons-pool2/2.5.0/commons-pool2-2.5.0.jar,http://central.maven.org/maven2/redis/clients/jedis/2.9.0/jedis-2.9.0.jar,http://dl.bintray.com/spark-packages/maven/RedisLabs/spark-redis/0.3.2/spark-redis-0.3.2.jar,http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.10.0.1/kafka-clients-0.10.0.1.jar,http://central.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.3.1/spark-sql-kafka-0-10_2.11-2.3.1.jar,http://central.maven.org/maven2/org/apache/spark/spark-sql_2.11/2.3.1/spark-sql_2.11-2.3.1.jar,http://central.maven.org/maven2/com/typesafe/config/1.3.2/config-1.3.2.jar,http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar,gs://uncoil-artifacts/uncoil-job/uncoil-job-ryan.jar,gs://uncoil-artifacts/uncoil-job/uncoil-job-ryan.jar)
2018-07-19 18:12:28 INFO SparkPodInitContainer:54 - Downloading remote files: None
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://central.maven.org/maven2/redis/clients/jedis/2.9.0/jedis-2.9.0.jar to /var/spark-data/spark-jars/fetchFileTemp8132912689783452936.tmp
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.10.0.1/kafka-clients-0.10.0.1.jar to /var/spark-data/spark-jars/fetchFileTemp8108929517189477087.tmp
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/commons/commons-pool2/2.5.0/commons-pool2-2.5.0.jar to /var/spark-data/spark-jars/fetchFileTemp29997743187335354.tmp
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.3.1/spark-sql-kafka-0-10_2.11-2.3.1.jar to /var/spark-data/spark-jars/fetchFileTemp918990109718368724.tmp
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://dl.bintray.com/spark-packages/maven/RedisLabs/spark-redis/0.3.2/spark-redis-0.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp3021150405751723647.tmp
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/spark/spark-sql_2.11/2.3.1/spark-sql_2.11-2.3.1.jar to /var/spark-data/spark-jars/fetchFileTemp8195280120682746690.tmp
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://central.maven.org/maven2/com/typesafe/config/1.3.2/config-1.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp120501099999836480.tmp
2018-07-19 18:12:28 INFO Utils:54 - Fetching http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar to /var/spark-data/spark-jars/fetchFileTemp6532435163420374918.tmp
2018-07-19 18:12:28 INFO GoogleHadoopFileSystemBase:637 - GHFS version: 1.9.2-hadoop2
2018-07-19 18:12:28 INFO SparkPodInitContainer:54 - Finished downloading application dependencies.
FYI using a different jar path when running this, but that file does exist
Interestingly, the logs didn't show any attempt to download gs://uncoil-artifacts/uncoil-job/uncoil-job.jar
, although Downloading remote jars: Some(...)
does include the jar.
Here is working 2.3.0 in production to compare
++ id -u
+ myuid=0
++ id -g
+ mygid=0
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=init
+ '[' -z init ']'
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ sed 's/[^=]*=\(.*\)/\1/g'
+ grep SPARK_JAVA_OPT_
+ env
+ readarray -t SPARK_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-class" "org.apache.spark.deploy.k8s.SparkPodInitContainer" "$@")
+ exec /sbin/tini -s -- /opt/spark/bin/spark-class org.apache.spark.deploy.k8s.SparkPodInitContainer /etc/spark-init/spark-init.properties
2018-07-19 18:16:31 INFO SparkPodInitContainer:54 - Starting init-container to download Spark application dependencies.
2018-07-19 18:16:31 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-07-19 18:16:32 INFO SecurityManager:54 - Changing view acls to: root
2018-07-19 18:16:32 INFO SecurityManager:54 - Changing modify acls to: root
2018-07-19 18:16:32 INFO SecurityManager:54 - Changing view acls groups to:
2018-07-19 18:16:32 INFO SecurityManager:54 - Changing modify acls groups to:
2018-07-19 18:16:32 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2018-07-19 18:16:32 INFO SparkPodInitContainer:54 - Downloading remote jars: Some(http://central.maven.org/maven2/org/apache/commons/commons-pool2/2.5.0/commons-pool2-2.5.0.jar,http://central.maven.org/maven2/redis/clients/jedis/2.9.0/jedis-2.9.0.jar,http://dl.bintray.com/spark-packages/maven/RedisLabs/spark-redis/0.3.2/spark-redis-0.3.2.jar,http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.10.0.1/kafka-clients-0.10.0.1.jar,http://central.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.3.0/spark-sql-kafka-0-10_2.11-2.3.0.jar,http://central.maven.org/maven2/org/apache/spark/spark-sql_2.11/2.3.0/spark-sql_2.11-2.3.0.jar,http://central.maven.org/maven2/com/typesafe/config/1.3.2/config-1.3.2.jar,http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar,gs://uncoil-artifacts/uncoil-job-production/uncoil-job-production.jar,gs://uncoil-artifacts/uncoil-job-production/uncoil-job-production.jar)
2018-07-19 18:16:32 INFO SparkPodInitContainer:54 - Downloading remote files: None
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.10.0.1/kafka-clients-0.10.0.1.jar to /var/spark-data/spark-jars/fetchFileTemp7252574674392297363.tmp
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://central.maven.org/maven2/redis/clients/jedis/2.9.0/jedis-2.9.0.jar to /var/spark-data/spark-jars/fetchFileTemp7516920988049583109.tmp
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/spark/spark-sql-kafka-0-10_2.11/2.3.0/spark-sql-kafka-0-10_2.11-2.3.0.jar to /var/spark-data/spark-jars/fetchFileTemp7809335549667476709.tmp
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/commons/commons-pool2/2.5.0/commons-pool2-2.5.0.jar to /var/spark-data/spark-jars/fetchFileTemp6980777093692990708.tmp
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://central.maven.org/maven2/org/apache/spark/spark-sql_2.11/2.3.0/spark-sql_2.11-2.3.0.jar to /var/spark-data/spark-jars/fetchFileTemp1972656710207288539.tmp
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://central.maven.org/maven2/com/typesafe/config/1.3.2/config-1.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp6053943254686834708.tmp
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar to /var/spark-data/spark-jars/fetchFileTemp2775901386905387678.tmp
2018-07-19 18:16:32 INFO Utils:54 - Fetching http://dl.bintray.com/spark-packages/maven/RedisLabs/spark-redis/0.3.2/spark-redis-0.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp6328779391289019710.tmp
2018-07-19 18:16:32 INFO GoogleHadoopFileSystemBase:607 - GHFS version: 1.6.3-hadoop2
2018-07-19 18:16:33 WARN GoogleHadoopFileSystemBase:1876 - No working directory configured, using default: 'gs://uncoil-artifacts/'
2018-07-19 18:16:33 WARN GoogleHadoopFileSystemBase:1876 - No working directory configured, using default: 'gs://uncoil-artifacts/'
2018-07-19 18:16:33 INFO Utils:54 - Fetching gs://uncoil-artifacts/uncoil-job-production/uncoil-job-production.jar to /var/spark-data/spark-jars/fetchFileTemp8154449709090930249.tmp
2018-07-19 18:16:33 INFO Utils:54 - Fetching gs://uncoil-artifacts/uncoil-job-production/uncoil-job-production.jar to /var/spark-data/spark-jars/fetchFileTemp8629099277871937989.tmp
2018-07-19 18:16:33 WARN GoogleCloudStorageReadChannel:493 - Channel for 'gs://uncoil-artifacts/uncoil-job-production/uncoil-job-production.jar' is not open.
2018-07-19 18:16:33 INFO Utils:54 - /var/spark-data/spark-jars/fetchFileTemp8629099277871937989.tmp has been previously copied to /var/spark-data/spark-jars/uncoil-job-production.jar
2018-07-19 18:16:33 WARN GoogleCloudStorageReadChannel:493 - Channel for 'gs://uncoil-artifacts/uncoil-job-production/uncoil-job-production.jar' is not open.
2018-07-19 18:16:33 INFO SparkPodInitContainer:54 - Finished downloading application dependencies.
I do see this GHFS version bump
2018-07-19 18:12:28 INFO GoogleHadoopFileSystemBase:637 - GHFS version: 1.9.2-hadoop2
2018-07-19 18:16:32 INFO GoogleHadoopFileSystemBase:607 - GHFS version: 1.6.3-hadoop2
Interesting. The image with GCS support uses https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
. This must have been updated from version 1.6.3 to 1.9.2.
I switched to https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-1.6.3-hadoop2.jar and it works! So must be breaking changes in the latest connector. Closing this and can open a new ticket
Cool! Looking at the change list at https://github.com/GoogleCloudPlatform/bigdata-interop/blob/fe662298d6c0d892be0468c660d5ca76f8fc0fcc/gcs/CHANGES.txt and trying to figure what could break.
Did the same, didn't see anything obvious, although thought it was intersting fs.gs.project.id is now optional
Yeah, that becomes optional. I think fs.gs.system.bucket has been deprecated also. I'm gonna dig more on this.
Hello, chatted in Slack as well, but my team has been trying to switch to Spark 2.3.1 using this guide: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#customizing-the-spark-operator
After completing the below steps any driver pods immediately fail with "Error: Could not find or load main class"
Possibly we are missing a step? Read below. Assistance would be appreciated as 2.3.1 solves a bug in dynamic partition overwrite mode.
Download Spark source code https://archive.apache.org/dist/spark/spark-2.3.1/spark-2.3.1.tgz
Compile Spark with Kubernetes support ./build/mvn -Pkubernetes -DskipTests clean package
Build Spark 2.3.1 docker image $ ./bin/docker-image-tool.sh -r gcr.io/uncoil-io/spark -t v2.3.1 build $ ./bin/docker-image-tool.sh -r gcr.io/uncoil-io/spark -t v2.3.1 push
Clone https://github.com/GoogleCloudPlatform/spark-on-k8s-gcp-examples
Copy conf folder to dockerfiles/spark-gcs
Edit FROM in dockerfiles/spark-gcs/Dockerfile to: FROM gcr.io/uncoil-io/spark/spark:v2.3.1
Run “gcloud auth configure-docker”
From spark-op-k8s-gcp-examples/dockerfiles/spark-gcs run:
docker build . -t gcr.io/uncoil-io/spark:v2.3.1-gcs docker push gcr.io/uncoil-io/spark:v2.3.1-gcs
Clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator Can checkout master or v1alpha-0.2-2.3.x tag, which ever works in the end
Edit spark-on-k8s-operator/Dockerfile second FROM to be: FROM gcr.io/uncoil-io/spark:v2.3.1
Run in spark-on-k8s-operator
docker build . -t gcr.io/uncoil-io/spark-operator:v1alpha1-0.2-2.3.1 docker push gcr.io/uncoil-io/spark-operator:v1alpha1-0.2-2.3.1
Edit spark-on-k8s-operator/manifest/spark-operator.yaml Set image to gcr.io/uncoil-io/spark-operator:v1alpha1-0.2-2.3.1
Delete the sparkoperator namespace kubectl delete namespace sparkoperator
Delete any sparkapplications as well kubectl delete sparkapplications --all kubectl delete scheduledsparkapplications --all
Wait a few minutes.... then kubectl apply -f spark-on-k8s-operator/manifest/
Check for when ready kubectl get pods -w --namespace sparkoperator
Edit app.template.yaml and uncoil-runner-yaml: image: gcr.io/uncoil-io/spark:v2.3.1-gcs replace 2.3.0 with 2.3.1 in dependencies and the version label
Now see if it works!