volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.06k stars 936 forks source link

E2E Spark Integration Test always failed #3623

Closed googs1025 closed 1 month ago

googs1025 commented 1 month ago

What happened:

output log

------
Dockerfile:27
--------------------
  26 |     RUN mkdir ${SPARK_HOME}/python
  27 | >>> RUN apt-get update && \
  28 | >>>     apt install -y python3 python3-pip && \
  29 | >>>     pip3 install --upgrade pip setuptools && \
  30 | >>>     # Removed the .cache to save space
  31 | >>>     rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*
  32 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get update &&     apt install -y python3 python3-pip &&     pip3 install --upgrade pip setuptools &&     rm -rf /root/.cache && rm -rf /var/cache/apt/* && rm -rf /var/lib/apt/lists/*" did not complete successfully: exit code: 1
Failed to build PySpark Docker image, please refer to Docker build output for details.
[error] java.lang.IllegalStateException: Process '/home/runner/work/volcano/volcano/spark/bin/docker-image-tool.sh -r docker.io/kubespark -t dev -p /home/runner/work/volcano/volcano/spark/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile -R /home/runner/work/volcano/volcano/spark/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile -f /home/runner/work/volcano/volcano/spark/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile build' exited with 1.
[error]     at KubernetesIntegrationTests$.$anonfun$settings$67(SparkBuild.scala:1050)
[error]     at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[error]     at sbt.std.Transform$$anon$3.$anonfun$apply$2(Transform.scala:46)
[error]     at sbt.std.Transform$$anon$4.work(Transform.scala:68)
[error]     at sbt.Execute.$anonfun$submit$2(Execute.scala:282)
[error]     at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:23)
[error]     at sbt.Execute.work(Execute.scala:291)
[error]     at sbt.Execute.$anonfun$submit$1(Execute.scala:282)
[error]     at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:265)
[error]     at sbt.CompletionService$$anon$2.call(CompletionService.scala:64)
[error]     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error]     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]     at java.lang.Thread.run(Thread.java:750)
[error] (kubernetes-integration-tests / dockerImgs) java.lang.IllegalStateException: Process '/home/runner/work/volcano/volcano/spark/bin/docker-image-tool.sh -r docker.io/kubespark -t dev -p /home/runner/work/volcano/volcano/spark/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile -R /home/runner/work/volcano/volcano/spark/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile -f /home/runner/work/volcano/volcano/spark/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile build' exited with 1.
[error] Total time: 332 s (05:32), completed Jul 25, 2024 8:09:45 AM

Error: Process completed with exit code 1.

What you expected to happen: pass this ci

How to reproduce it (as minimally and precisely as possible): We can view it from the PR submission place https://github.com/volcano-sh/volcano/pulls

Anything else we need to know?: None Environment:

googs1025 commented 1 month ago

/kind flake

Monokaix commented 1 month ago

debugging now,maybe we should sync from https://github.com/apache/spark/blob/147a98b7e1a374b859c229a35d418cd88d71bcb2/.github/workflows/build_and_test.yml#L953