odpi / egeria-charts

Helm chart repository
https://odpi.github.io/egeria-charts
Apache License 2.0
13 stars 9 forks source link

4.1 release - Jupyter fails to launch #268

Closed planetf1 closed 1 year ago

planetf1 commented 1 year ago

I tried out the 4.1 charts (on OpenShift 4.12) (3 nodes, 16GB/4vCPU each)

The jupyter server fails to correctly start ie:

lab-odpi-egeria-lab-jupyter-697d4cf7f7-xkm5q      0/1     Running   5 (2m11s ago)   15m

Full log from this pod posted to https://gist.github.com/planetf1/48f88a00770e3aa68496003e0ad78a76

Failure:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py", line 656, in get
    value = obj._trait_values[self.name]
KeyError: 'port'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/jupyter-lab", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/extension/application.py", line 607, in launch_instance
    serverapp = cls.initialize_server(argv=args)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/extension/application.py", line 577, in initialize_server
    serverapp.initialize(
  File "/opt/conda/lib/python3.10/site-packages/traitlets/config/application.py", line 113, in inner
    return method(app, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/serverapp.py", line 2556, in initialize
    self.init_httpserver()
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/serverapp.py", line 2379, in init_httpserver
    self._find_http_port()
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/serverapp.py", line 2423, in _find_http_port
    port = self.port
  File "/opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py", line 703, in __get__
    return self.get(obj, cls)
  File "/opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py", line 659, in get
    default = obj.trait_defaults(self.name)
  File "/opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py", line 1872, in trait_defaults
    return self._get_trait_default_generator(names[0])(self)
  File "/opt/conda/lib/python3.10/site-packages/traitlets/traitlets.py", line 1233, in __call__
    return self.func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/jupyter_server/serverapp.py", line 960, in _port_default
    return int(os.getenv(self.port_env, self.port_default_value))
ValueError: invalid literal for int() with base 10: 'tcp://172.21.34.83:8888'

Investigating ... cc: @lpalashevski

planetf1 commented 1 year ago

This seems to be the same issue as https://github.com/jupyterlab/jupyterlab/issues/10576

The reported issue there was that a service named 'jupyter' existed in the same namespace.

This is indeed the case in my environment - I have a load balancer service named 'jupyter' which has been present for years.

jupyter                        LoadBalancer   nnn.nnn.nnn.nnn     xxx.appdomain.cloud   8888:30777/TCP               66d

I'm guessing that the latest jupyter lab update has made some assumption when trying to retrieve a port name which is not set explicitly.

Deleting this service (and recreating with a new name) avoids this issue, and the jupyter pod then starts up.

For now, no plans to update docs. Hopefully if anyone else hits this issue they will find this issue.