jupyter-on-openshift / jupyterhub-quickstart

OpenShift compatible version of the JupyterHub application.
Apache License 2.0
101 stars 107 forks source link

Open port in workspace #30

Closed s1106838 closed 4 years ago

s1106838 commented 4 years ago

Hello,

I would like to know how I can open a port in jupyterhub notebook that is being deployed from jupyterhub. This is for spark so it can connect to a spark cluster.

The ports I want to use are already in my dockerfile but when jupyterhub rols out this container it only opens port 8080. but if I run the image standalone the ports are being opend. I look in openshift but I can't find the deployment config for the notebook that is being used so i can change that.

GrahamDumpleton commented 4 years ago

Are you sure you are listening on '0.0.0.0' and not 'localhost' or '127.0.0.1'? If you listen on the latter, it can only accept connections from containers in the same pod (host if docker on own machine). Is the thing trying to connect to the pod in the same project? Port definitions are more for documentation purposes to help set up services. So long as is listening on all interfaces, ie., '0.0.0.0', it should be able to accept a connection on pod IP regardless of what the port specification is.

s1106838 commented 4 years ago

Yes it is listening on 0.0.0.0 en is in the same project. if I launch the image without jupyterhub (so standalone) the connection works just fine. That is because I can open the ports in de netwerk for that pod. But if the pod is deployed by jupyterhub I dont know where en how to configer what ports to open.If I look at the logs from spark it literally says it can not connect back to jupyhter.

Can you tell me where the config is stored for the jupyterhub notebook that is being deployed from jupyterhub in openshift ?

GrahamDumpleton commented 4 years ago

Not sure you are understanding what I am saying. You should not need to add a port definition in the pod specification in order to be able to connect to the pod using that port. All ports on pods should be exposed by default. Any port definition on a pod only really serves as documentation and doesn't control what ports are accessible. Thus the only requirements should be:

What are you passing the Spark cluster in order for it to connect back to the pod for the Jupyter notebook instance? Are you using the name of the pod as a hostname, or are you using the IP address of the pod? It should be the IP address.

Is your Spark cluster even deployed in the same project or cluster, or since you talk about exposing ports in the network, is it actually totally outside of the OpenShift cluster. If your Spark cluster is not even in the cluster, exposing a single pod for a notebook instance so it is accessible outside of the cluster is a much more complicated thing to do.

GrahamDumpleton commented 4 years ago

Whoops, one correction, since having to use pod IP, isn't actually accessible from outside of the project. So doesn't matter if not using multi tenant overlay, or you had set up a network policy, still will not be accessible. This is when you need a service. So if your client is outside of the project it can't work as there is no service corresponding to the Jupyter notebook pod instance.

s1106838 commented 4 years ago

The Spark cluster en jupyterhub are in the same project en on the same node so this can't be the issue. the pods is listening on 0.0.0.0 so that can't be it. I use the same image en same code in the same project, but the problem is that if jupytherhub make the pod I can't connect the spark. so the image can't be the problem.

The only different is that when using jupyterhub that the pod is not creating that services. So it looks like a connection issue. Lets say just for testing purpose, how do I create the services for the pod that is deployed by jupyterhub. I would like to change this because is it deployed

GrahamDumpleton commented 4 years ago

You haven't addressed the issue I raise around use of an IP address. Are you passing the pod IP address to the Spark cluster for it to call back on?

Also try this. For the Jupyter notebook instance, create a terminal from the Jupyter notebook web interface, or by using oc rsh to access the pod. Then execute:

curl $HOSTNAME:8080

Repeat this but change port 8080 to whatever the port is that Spark is expecting to contact back on.

The HOSTNAME environment variable will be the name of the pod, but that will be mapped to an IP address because of an entry in the /etc/hosts file within just that container (not DNS). That IP is the same IP that should be used by Spark to access the container from outside. If the test works on that port from within the pod, it should also work from outside, a service should not be required.

As to the service, JupyterHub doesn't create them for a Jupyter notebook instance as it doesn't need them. It uses the IP address of the pod to contact it. JupyterHub itself therefore doesn't provide any support for creating a service object which maps through to the IP. One could hack one up manually, but do the test above first.

FWIW. I have never come across a cluster where contacting pods by IP address within the bounds of the same project wasn't possible. The only way I know of as to how it could be blocked, is if a network security policy has been defined for the project which blocks it in some way.

What is this cluster you are using. Minishift, Code ready containers. OpenShift container platform, OpenShift Online. OKD??? And what version. Do you manage it, or is it managed separately and may have been locked down using additional network policies.

s1106838 commented 4 years ago

I run the test en it failed, port 8080 works but any other port not. I am using the ip address in the /etc/hosts file from the pod to connect with spark.

I am using OKD 3.11 on CentOS, I manage the cluster en no locked down have been made.

GrahamDumpleton commented 4 years ago

And using curl from inside of the container to the port, but with 127.0.0.1 or localhost works?

GrahamDumpleton commented 4 years ago

So if it turns out for some reason that your OKD setup is locking down things in a strange way where a port definition is needed on the pod (which as I say shouldn't actually be required), try supplying for the config map for JupyterHub, a jupyterhub_config.py file containing:

def modify_pod_hook(spawner, pod):
    pod.spec.containers[0].ports.append([{
       "containerPort": 9000,
       "protocol": "TCP"
    }]) 
    return pod

c.KubeSpawner.modify_pod_hook = modify_pod_hook

Replace 9000 with the actual port number.

Verify what gets output in the pod and modify as necessary.

Try those tests again from inside of the container where you access it as $HOSTNAME:9000.

Before trying this change, confirm for me first whether accessing the port from the container using 127.0.0.1 or localhost works. You say you are listening on 0.0.0.0, yet you haven't confirmed you can connect from the container itself on local addresses that I can see. You have said $HOSTNAME fails though.

Note, after modifying the config map, you will need to force a re-deployment of JupyterHub by triggering a new rollout, or deleting the existing JupyterHub pod.

s1106838 commented 4 years ago

Thanks that works! I made one small change to get it working.( removed the [ ] ) It is possible to open more than one port?

def modify_pod_hook(spawner, pod): pod.spec.containers[0].ports.append({ "containerPort": 9000, "protocol": "TCP" }) return pod

c.KubeSpawner.modify_pod_hook = modify_pod_hook

As for the the curl test: $HOSTNAME:8080 - works 127.0.0.1:8080 - works 172.17.0.17 - works

Same test on a other port I get connection refused

The commando that is being ran by juptyterhub : container-entrypoint start-singleuser.sh --ip=0.0.0.0 --port=8080 That's why I am sure that the container is listening on 0.0.0.0

GrahamDumpleton commented 4 years ago

The --ip=0.0.0.0 only relates to port 8080 and nothing to do with any other ports you may open up. I got the impression that you are trying to open a separate port. Does that separate port work? You don't actually above clearly say if it does or not when adding that port definition for your port. You only mention the curl tests for port 8080.

s1106838 commented 4 years ago

I didn't know that was only for port 8080, I am sorry. Yes that was what I was trying, but I have got it working with the code you send me, thanks for that!