jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
620 stars 223 forks source link

Failed to start kernels (Spark Operator) keeps trying to connect indefinitely #1266

Closed lresende closed 1 year ago

lresende commented 1 year ago

Description

When a Spark Operator/CRD kernel fails to start on the spark operator side, the gateway keeps pulling for connection indefinitely.

Expected behavior

I believe there are two issues here:

Logs

[D 2023-03-03 16:26:26.175 EnterpriseGatewayApp] 193: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:26.190 EnterpriseGatewayApp] Nudge: attempt 77 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:26.690 EnterpriseGatewayApp] 194: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:26.695 EnterpriseGatewayApp] Nudge: attempt 78 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:27.208 EnterpriseGatewayApp] 195: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:27.213 EnterpriseGatewayApp] Nudge: attempt 79 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:27.729 EnterpriseGatewayApp] 196: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[W 2023-03-03 16:26:27.734 EnterpriseGatewayApp] Nudge: attempt 80 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:28.247 EnterpriseGatewayApp] 197: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:28.252 EnterpriseGatewayApp] Nudge: attempt 81 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:28.765 EnterpriseGatewayApp] 198: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:28.770 EnterpriseGatewayApp] Nudge: attempt 82 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:29.286 EnterpriseGatewayApp] 199: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:29.287 EnterpriseGatewayApp] Nudge: attempt 83 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:29.789 EnterpriseGatewayApp] Nudge: attempt 84 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:29.807 EnterpriseGatewayApp] 200: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:30.295 EnterpriseGatewayApp] Nudge: attempt 85 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:30.327 EnterpriseGatewayApp] 201: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'
[D 2023-03-03 16:26:30.798 EnterpriseGatewayApp] Nudge: attempt 86 on kernel 68415e17-3ced-497f-86de-9c59285e2fec
[D 2023-03-03 16:26:30.846 EnterpriseGatewayApp] 202: Waiting to connect to k8s sparkapplication in namespace 'spark-applications'. Name: 'some-user-68415e17-3ced-497f-86de-9c59285e2fec-driver', Status: 'None', Pod IP: 'None', KernelID: '68415e17-3ced-497f-86de-9c59285e2fec'