kubeflow / spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache License 2.0
2.75k stars 1.36k forks source link

[BUG] Creating uiIngress fails, logging not showing details #2167

Open tcassaert opened 5 days ago

tcassaert commented 5 days ago

Description

When creating a SparkApplication with the uiIngress enabled, it fails to create the ingress:

2024-09-13T08:55:51.736Z    INFO    sparkapplication/driveringress.go:215    Creating extensions.v1beta1/Ingress for SparkApplication web UI    {"a-eeac4d03af3a461583fb9c51f4018979": "namespace", "spark-jobs-dev": "ingressName"}
2024-09-13T08:55:51.738Z    ERROR    sparkapplication/controller.go:260    Failed to submit SparkApplication    {"name": "a-eeac4d03af3a461583fb9c51f4018979", "namespace": "spark-jobs-dev", "error": "failed to create web UI service"}
github.com/kubeflow/spark-operator/internal/controller/sparkapplication.(*Reconciler).reconcileNewSparkApplication.func1
    /workspace/internal/controller/sparkapplication/controller.go:260
k8s.io/client-go/util/retry.OnError.func1
    /go/pkg/mod/k8s.io/client-go@v0.29.3/util/retry/util.go:51
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection
    /go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/wait.go:145
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
    /go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:461
k8s.io/client-go/util/retry.OnError
    /go/pkg/mod/k8s.io/client-go@v0.29.3/util/retry/util.go:50
k8s.io/client-go/util/retry.RetryOnConflict
    /go/pkg/mod/k8s.io/client-go@v0.29.3/util/retry/util.go:104
github.com/kubeflow/spark-operator/internal/controller/sparkapplication.(*Reconciler).reconcileNewSparkApplication
    /workspace/internal/controller/sparkapplication/controller.go:247
github.com/kubeflow/spark-operator/internal/controller/sparkapplication.(*Reconciler).Reconcile
    /workspace/internal/controller/sparkapplication/controller.go:179
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:227
2024-09-13T08:55:51.758Z    INFO    sparkapplication/event_handler.go:188    SparkApplication updated    {"name": "a-eeac4d03af3a461583fb9c51f4018979", "namespace": "spark-jobs-dev", "oldState": "", "newState": "SUBMISSION_FAILED"}

Reproduction Code [Required]

Steps to reproduce the behavior:

---
controller:
  batchScheduler:
    enable: true
  podMonitor:
    create: true
  uiIngress:
    enable: true
    urlFormat: '{{$appName}}.{{$appNamespace}}.batch.stag.warsaw.openeo.dataspace.copernicus.eu'
spark:
  jobNamespaces:
    - ""

Expected behavior

The operator is able to create the Ingress. If it fails, the logging should show the reason why it fails.

Actual behavior

The operator fails to create the Ingress and the logging isn't showing any relevant information about why it's failing.

Environment & Versions

Spark Operator App version: v2.0.0-rc.0 Helm Chart Version: v2.0.0-rc.0 Kubernetes Version: 1.25.7 Apache Spark version: 3.5.2

ChenYi015 commented 5 days ago

@tcassaert Thanks for reporting the bug. Would you like to contribute it? The error message should be wrapped and return.

https://github.com/kubeflow/spark-operator/blob/7785107ec5c04d5bd55dc5192a3ac7f835cf1b47/internal/controller/sparkapplication/controller.go#L672-L676

tcassaert commented 5 days ago

@ChenYi015 Should this be done with keeping the fmt.Errorf or should I use the logger.Error? I see both used throughout the code-base.

ChenYi015 commented 5 days ago

@ChenYi015 Should this be done with keeping the fmt.Errorf or should I use the logger.Error? I see both used throughout the code-base.

@tcassaert I think we can keep using the fmt.Errorf:

if err != nil { 
    return fmt.Errorf("failed to create web UI service: %v", err) 
} 
tcassaert commented 2 days ago

I've deployed a version with the logging:

2024-09-16T08:23:58.013Z        ERROR   sparkapplication/controller.go:260      Failed to submit SparkApplication    {"name": "a-e5cebe6a50f74a2f8bce42874ea56189", "namespace": "spark-jobs-dev", "error": "failed to create web UI service: failed to create ingress spark-jobs-dev/a-e5cebe6a50f74a2f8bce42874ea56189-ui-ingress: no matches for kind \"Ingress\" in version \"extensions/v1beta1\""}
github.com/kubeflow/spark-operator/internal/controller/sparkapplication.(*Reconciler).reconcileNewSparkApplication.func1
        /workspace/internal/controller/sparkapplication/controller.go:260
k8s.io/client-go/util/retry.OnError.func1
        /go/pkg/mod/k8s.io/client-go@v0.29.3/util/retry/util.go:51
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection
        /go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/wait.go:145
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
        /go/pkg/mod/k8s.io/apimachinery@v0.29.3/pkg/util/wait/backoff.go:461
k8s.io/client-go/util/retry.OnError
        /go/pkg/mod/k8s.io/client-go@v0.29.3/util/retry/util.go:50
k8s.io/client-go/util/retry.RetryOnConflict
        /go/pkg/mod/k8s.io/client-go@v0.29.3/util/retry/util.go:104
github.com/kubeflow/spark-operator/internal/controller/sparkapplication.(*Reconciler).reconcileNewSparkApplication
        /workspace/internal/controller/sparkapplication/controller.go:247
github.com/kubeflow/spark-operator/internal/controller/sparkapplication.(*Reconciler).Reconcile
        /workspace/internal/controller/sparkapplication/controller.go:179
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.5/pkg/internal/controller/controller.go:227

So to me it seems that it wants to use the legacy ingress in my case:

https://github.com/kubeflow/spark-operator/blob/9f0c08a65e9e956e3ff9838df59b392cb4ee72ca/internal/controller/sparkapplication/web_ui.go#L49-L55

I'm not entirely sure why though, as all other Ingresses in my cluster are networking.k8s.io/v1.