Closed lresende closed 1 year ago
Hi @lresende. I suspect this is coming from the culling logic. Were there a set of lines preceding the first error log statement in the above output similar to these (that appear in later):
[D 2023-02-23 18:28:13.907 EnterpriseGatewayApp] kernel_id=8a3b508a-41bb-4ef4-a9ff-8470a7ed1659, kernel_name=spark_32_python_kubernetes, last_activity=2023-02-23 17:21:05.848081+00:00
[W 2023-02-23 18:28:13.907 EnterpriseGatewayApp] Culling 'starting' kernel 'spark_32_python_kubernetes' (8a3b508a-41bb-4ef4-a9ff-8470a7ed1659) with 0 connections due to 4028 seconds of inactivity.
[D 2023-02-23 18:28:13.908 EnterpriseGatewayApp] Clearing buffer for 8a3b508a-41bb-4ef4-a9ff-8470a7ed1659
[I 2023-02-23 18:28:13.908 EnterpriseGatewayApp] Kernel shutdown: 8a3b508a-41bb-4ef4-a9ff-8470a7ed1659
[D 2023-02-23 18:28:13.908 EnterpriseGatewayApp] ERROR: ECONNREFUSED, no process listening, cannot send signal.
[D 2023-02-23 18:28:13.909 EnterpriseGatewayApp] OSError(ENOTCONN) raised on socket shutdown, listener has likely already exited. Cannot send '{'shutdown': 1}'
I agree this should be fixed, it's just a matter of where. If this needs to be addressed in the culling logic (which, I would say, should be sensitive to HTTPError 404 and eat that exception (with a log)), then the issue would need to be transfered to jupyter-server.
We could also tighten up RemoteKernelManager.remove_kernel()
so that super.remove_kernel()
handles 404 exceptions.
Were you going to look into this?
When we try to delete a kernel that does not exist, we should remove the kernel to avoid infinity retries on trying to shutdown the kernel.