kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.79k stars 1.37k forks source link

[BUG] Not able to open additional port on executor #2038

Open LIN-Yu-Ting opened 5 months ago

LIN-Yu-Ting commented 5 months ago

Description

I am using Spark Operator to run a Spark Job. In my application, I need to open a SSH port on executor so that they can copy data among each other.

Reproduction Code [Required]

Steps to reproduce the behavior:

I am using the following yaml to submit my Spark Job.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: <jobId>
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "<image>"
  imagePullPolicy: Always
  ...
  driver:
    cores: 1
    memory: "2g"
    serviceAccount: default
    ports:
      - name: "SSH"
        protocol: "TCP"
        containerPort: 2022
  executor:
    cores: 1
    memory: "8g"
    serviceAccount: default
    ports:
      - name: "SSH"
        protocol: "TCP"
        containerPort: 2022

Expected behavior

I expect to see Port: 7079/TCP, 2022/TCP

Actual behavior

kubectl describe one of generated executor. We can see that port is not able to be provisioned.

linyuting@lindembp aztk % kubectl describe pod fastp-x-0-8437c-run-eks-id-exec-5
Name:             fastp-x-0-8437c-run-eks-id-exec-5
Namespace:        default
Priority:         0
Service Account:  default
Node:             ip-172-31-44-56.us-west-2.compute.internal/172.31.44.56
Start Time:       Thu, 30 May 2024 10:39:08 +0800
Labels:           spark-app-name=pipedpiper
                  spark-app-selector=spark-fe603901ae9542299462d06a900f8cf1
                  spark-exec-id=5
                  spark-exec-resourceprofile-id=0
                  spark-role=executor
                  spark-version=3.3.0
                  sparkoperator.k8s.io/app-name=fastp-x-0-8437c-run-eks-id
                  sparkoperator.k8s.io/launched-by-spark-operator=true
                  sparkoperator.k8s.io/submission-id=08e959bc-da9e-4582-abe7-c1b5cc9fa976
Annotations:      <none>
Status:           Running
IP:               172.31.45.34
IPs:
  IP:           172.31.45.34
Controlled By:  Pod/fastp-x-0-8437c-run-eks-id
Containers:
  spark-kubernetes-executor:
    Container ID:  containerd://a9a25744a32057a95212fa2ef759939f0b1fbb1f7c590144a5be8f0412b600a1
    Image:         atgenomix.azurecr.io/atgenomix/runtime/germlineanalysis:4.0.3_20.04
    Image ID:      atgenomix.azurecr.io/atgenomix/runtime/germlineanalysis@sha256:293e58e226a65781102fc72dbea0fa9414f2c6bd6a965d59797e3b6f0616c038
    Port:          7079/TCP
    Host Port:     0/TCP

Terminal Output Screenshot(s)

Environment & Versions

Additional context

I am following the apiSpec mentioned in link

LIN-Yu-Ting commented 5 months ago

It seems that this is related to this PR. However, no response on this issue since long time ago https://github.com/kubeflow/spark-operator/pull/1520