jupyterhub / batchspawner

Custom Spawner for Jupyterhub to start servers in batch scheduled systems
BSD 3-Clause "New" or "Revised" License
190 stars 134 forks source link

internal_ssl + SlurmSpawner leads to certificate verification error #192

Open abagali1 opened 4 years ago

abagali1 commented 4 years ago

Bug description

I have setup a JupyterHub instance on my cluster's login node that uses SlurmSpawner to spawn notebook servers on our cluster. I have verified that SlurmSpawner works (wonderfully btw) and that SSL works everywhere except between the Hub server and the spawned notebook servers. I was experimenting with JupyterHub's internal_ssl feature but as soon as I set it to True in the config I was met with this error

[W 2020-09-19 20:15:21.818 SingleUserNotebookApp iostream:1432] SSL Error on 9 ('[IP]', 8081): [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108)
[E 2020-09-19 20:15:21.819 SingleUserNotebookApp singleuser:434] Failed to connect to my Hub at https://[IP]:8081/hub/api (attempt 3/5). Is it running?
    Traceback (most recent call last):
      File "/opt/jupyterhub/lib/python3.8/site-packages/jupyterhub/singleuser.py", line 432, in check_hub_version
        resp = await client.fetch(self.hub_api_url)
      File "/opt/jupyterhub/lib/python3.8/site-packages/tornado/simple_httpclient.py", line 330, in run
        stream = await self.tcp_client.connect(
      File "/opt/jupyterhub/lib/python3.8/site-packages/tornado/tcpclient.py", line 293, in connect
        stream = await stream.start_tls(
      File "/opt/jupyterhub/lib/python3.8/site-packages/tornado/iostream.py", line 1417, in _do_ssl_handshake
        self.socket.do_handshake()
      File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake
        self._sslobj.do_handshake()
    ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1108

I have looked at #31, #103, and jupyterhub/jupyterhub#2055 but I cannot find good documentation on this issue/what I am doing wrong.

Your personal set up

JupyterHub instance using SlurmSpawner to spawn notebook servers. Hub instance is on the same machine as Slurm login node.

welcome[bot] commented 4 years ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

leitec commented 4 years ago

I ran into the same problem and did some digging into it. I'm still new to JupyterHub, so please take everything here with a grain of salt.

The main issue is that as it stands now, the singleuser process does not have the CA certificate and the SSL key it needs to talk to the Hub server. These are created by the create_certs() method in the base Spawner class. BatchSpawner does not interfere with this. The certificates are being created for the user session, and their locations are passed along in environment variables.

There are two problems, though:

  1. The created key/certs are owned and readable only by root. They need to be accessible by the user. If JupyterHub is configured to put them in a globally-accessible directory then all you'd need to do is chown the files over to the user. If not, you would need to move/copy them to a place the user can access. The move_certs() Spawner method can be used to do both of these. The LocalProcessSpawner has an implementation that could be helpful in BatchSpawner.

  2. Both the hub server and the node where the singleuser process runs must be listed in the alt names property of the SSL certificate. The catch is that the certificates must be created before the job is even submitted. We have no idea which node the batch job will end up on until the process is about to start.

I can't think of a batch-friendly way to solve problem 2 if the goal is to have only the specific node listed. The simplest workaround would be to list all possible nodes in the _ssl_altnames configuration entry. A slightly better implementation could use a pre-spawn hook to automatically add all nodes in the selected partition, or something to that effect.

I set up a quick test with a move_certs() implementation that gives the user access to the certificates, and set up a pre-spawn hook that adds the (hardcoded) node name to the spawner's _ssl_altnames at runtime. The config file's entry just lists the Hub server. These were enough to get working internal SSL.

Hoeze commented 3 years ago

Hi @leitec, I'd also like to use SlurmSpawner with internal_ssl enabled. Why is (2) necessary? Wouldn't it be enough if the client gets any valid signed certificate?

leitec commented 3 years ago

Hi @Hoeze, it's been a while since I looked at this, and things may have changed since then.

I recall that the internal SSL mode uses fairly strict certificate validation. If the hub server is not in alt names, the singleuser process can't provide the hub server with its address and port number. I think it's expected that you will add the hub server hostname there. But then, if the node where singleuser is running isn't listed in alt names, the hub server can't contact the singleuser server at the given address and port.

This refers to the back end certificates created by JupyterHub for each session when _internalssl is enabled, not the server certificate used on the user-facing JupyterHub endpoint, in case that's what you meant by client.

Hoeze commented 3 years ago

I see, thanks @leitec!