Closed OrenZ1 closed 6 months ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
Hi @OrenZ1. Is there a reason why you don't deploy the Kernel Image Puller daemonset? This will read the kernel specs and pull images embedded in the specs. Otherwise, you should pre-pull your images manually.
Hi, thank you for your response! We don’t want to pre-pull images, due to an extremely large amount of images (we also allow users to add additional kernel images whenever they desire). We actually don’t mind the large time it takes for a kernel to launch, it’s only natural due to the constraint I previously mentioned and the size of the images, but we do not want this error message to be shown if possible. On another note, we suspect the issue might be related to the JupyterHub integration with the Lab and the Enterprise Gateway. When we deploy an independent Lab pod, with the same CMD, and same configurations (including to the Gateway), we receive no such error.
Are you using Kubernetes, might be related to ingress timeouts while waiting for the kernel to get started?
I am using OpenShift (and Kubernetes), but I am not using ingresses, I only create services for the Enterprise Gateway and the JupyterHub (and a route for the JupyterHub itself)
What are you using for the gateway URL? the service URL?
Yes, the service is of type cluster IP. And I use the URL which is composed of the service, the namespace, and svc.cluster.local
Fixed it! The problem was with the NodeJS version on the JupyterHub. It was NodeJS v10, which had a 120 seconds default timeout.
Description
Hello! I am using jupyter enterprise gateway 3.2.2 with jupyter hub 2.1.1 and JupyterLab 3.6.3. I am having a weird problem when starting a kernel on a kubernetes cluster. The images I use for the kernels are heavy, and so it takes long to pull them onto the kernel pods. Therefore, I’ve set the ‘—GatewayClient.request_timeout’ on the JupyterLab to 5 minutes. I’ve also modified the Openshift’s route configuration to have a larger timeout than the default 30 seconds. When I try to launch a new kernel, after approximately 2 minutes, I get the following error message on the JupyterLab: “Error Starting Kernel Invalid response: 503 Service Temporarily Unavailable” This changes the selected kernel on the lab to “no kernel” automatically. There are no additional error messages or error / warning logs displayed in the JupyterLab itself, in the enterprise gateway and in the JupyterHub. All 3 are set to a DEBUG log level. After some additional time, when the kernel pod actually launches, I can choose it from the kernel tab in the “Use Kernel from Other Session” section. I am looking for a way to avoid such error message, and I want to understand where is this error coming from.
Note - I’ve also tried to configure other timeout settings in the JupyterLab, such as ‘—GatewatClient.connect_timeout’, ‘—GatewatClient.response_timeout’, ‘—GatewatClient.gateway_retry_interval_max’. All of these changes led to no change in the behavior I mentioned above.
Reproduce
Expected behavior
I would like this message to not appear if a larger timeout is set, and maybe even a loading indicator for kernels which are not yet ready.
Context
Troubleshoot Output
Command Line Output
Browser Output