Open rgaiacs opened 11 months ago
https://github.com/jupyterhub/mybinder.org-deploy/blob/main/mybinder/templates/netpol.yaml is been deploy on GESIS cluster as
Name: binder-users
Namespace: gesis
Created on: 2023-10-26 10:51:01 +0200 CEST
Labels: app=binderhub
app.kubernetes.io/managed-by=Helm
chart=gesis-3.0.0
component=user-netpol
release=binderhub
Annotations: meta.helm.sh/release-name: binderhub
meta.helm.sh/release-namespace: gesis
Spec:
PodSelector: component in (dind,image-builder,singleuser-server),release=binderhub
Allowing ingress traffic:
<none> (Selected pods are isolated for ingress connectivity)
Allowing egress traffic:
To Port: 53/TCP
To Port: 53/UDP
...
(part of the specs ommited) and the Docker-in-Docker pod is
Name: binderhub-dind-brbmp
Namespace: gesis
Priority: 0
Service Account: default
Node: spko-css-app03/194.95.75.12
Start Time: Wed, 04 Oct 2023 13:44:30 +0200
Labels: app=binder
component=image-builder
controller-revision-hash=75bb485d7f
heritage=Helm
name=binderhub-dind
pod-template-generation=2
release=binderhub
Annotations: <none>
Status: Running
IP: 10.244.3.163
IPs:
IP: 10.244.3.163
Controlled By: DaemonSet/binderhub-dind
Containers:
dind:
Container ID: containerd://65f15f3de0865306f100afae7e5d4fdbb9d9c8fdfe0283825667e911129dac6b
Image: docker.io/library/docker:24.0.6-dind
Image ID: docker.io/library/docker@sha256:f28ffd78641197871fea8fd679f2bf8a1cdafa4dc3f1ce3e700ad964aac2879a
Port: <none>
Host Port: <none>
Args:
dockerd
--storage-driver=overlay2
-H unix:///var/run/dind/docker.sock
--mtu=1000
State: Running
Started: Tue, 24 Oct 2023 09:15:19 +0200
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Sun, 22 Oct 2023 02:41:27 +0200
Finished: Tue, 24 Oct 2023 09:15:18 +0200
Ready: True
Restart Count: 3
Limits:
cpu: 4
memory: 4Gi
Requests:
cpu: 500m
memory: 1Gi
Environment: <none>
Mounts:
/var/lib/docker from dockerlib-dind (rw)
/var/run/dind from run-dind (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dblrz (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
dockerlib-dind:
Type: HostPath (bare host directory volume)
Path: /orc2_data/repo2docker
HostPathType: DirectoryOrCreate
run-dind:
Type: HostPath (bare host directory volume)
Path: /var/run/dind
HostPathType: DirectoryOrCreate
kube-api-access-dblrz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: binderhub=true
Tolerations: hub.jupyter.org/dedicated=user:NoSchedule
hub.jupyter.org_dedicated=user:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events: <none>
PodSelector
looks good to me. From the Docker-in-Docker pod, I still run
wget http://139.162.202.16
successfully.
Calico/Tigera Operator is running
Name: tigera-operator-f6bb878c4-p4ghb
Namespace: tigera-operator
Priority: 0
Service Account: tigera-operator
Node: svko-css-app01/194.95.75.9
Start Time: Wed, 25 Oct 2023 16:49:26 +0200
Labels: k8s-app=tigera-operator
name=tigera-operator
pod-template-hash=f6bb878c4
Annotations: <none>
Status: Running
IP: 194.95.75.9
IPs:
IP: 194.95.75.9
Controlled By: ReplicaSet/tigera-operator-f6bb878c4
Containers:
tigera-operator:
Container ID: containerd://f1440a31e51de0a0ad30b367318a0c972382c287652bc29eb497049b296a899b
Image: quay.io/tigera/operator:v1.30.7
Image ID: quay.io/tigera/operator@sha256:76715143082b0c45aa6fae57b8a2eac0213bef6ffb5c686e456a31b9a35069b3
Port: <none>
Host Port: <none>
Command:
operator
State: Running
Started: Wed, 25 Oct 2023 16:49:30 +0200
Ready: True
Restart Count: 0
Environment Variables from:
kubernetes-services-endpoint ConfigMap Optional: true
Environment:
WATCH_NAMESPACE:
POD_NAME: tigera-operator-f6bb878c4-p4ghb (v1:metadata.name)
OPERATOR_NAME: tigera-operator
TIGERA_OPERATOR_INIT_IMAGE_VERSION: v1.30.7
Mounts:
/var/lib/calico from var-lib-calico (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-24jnn (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
kube-api-access-24jnn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoExecute op=Exists
:NoSchedule op=Exists
Events: <none>
Does anyone see what I am missing? Thanks!
I might have discovered the missing piece:
apiVersion: v1
items:
- apiVersion: operator.tigera.io/v1
kind: TigeraStatus
metadata:
creationTimestamp: "2023-10-26T08:49:01Z"
generation: 1
name: apiserver
resourceVersion: "47435557"
uid: f8a49c99-888e-4d96-ae58-a96fab6cbb94
spec: {}
status:
conditions:
- lastTransitionTime: "2023-10-26T08:49:06Z"
message: 'Waiting for Installation to be ready: '
observedGeneration: 1
reason: ResourceNotReady
status: "True"
type: Degraded
- apiVersion: operator.tigera.io/v1
kind: TigeraStatus
metadata:
creationTimestamp: "2023-10-26T08:49:01Z"
generation: 1
name: calico
resourceVersion: "47435556"
uid: 05bf3d62-33f9-43c9-897c-f65083b742e7
spec: {}
status:
conditions:
- lastTransitionTime: "2023-10-26T08:49:06Z"
message: 'Error querying installation: Could not resolve CalicoNetwork IPPool
and kubeadm configuration: IPPool 192.168.0.0/16 is not within the platform''s
configured pod network CIDR(s) [10.244.0.0/16]'
observedGeneration: 1
reason: ResourceReadError
status: "True"
type: Degraded
kind: List
metadata:
resourceVersion: ""
I fixed the CalicoNetwork IPPool in GESIS node. I tested using deny all configuration and Network Policy is working. The problem now is that the hub
pod can't connect with a existing single user pod and user redict fails. Can I have some help debuging the Network Policies?
The Binderhub namespace has 3 Network Policies:
NAME POD-SELECTOR AGE
hub app=jupyterhub,component=hub,release=binderhub 126m
proxy app=jupyterhub,component=proxy,release=binderhub 126m
singleuser app=jupyterhub,component=singleuser-server,release=binderhub 126m
hub
Network PolicyName: hub
Namespace: gesis
Created on: 2023-10-27 12:01:12 +0200 CEST
Labels: app=jupyterhub
app.kubernetes.io/managed-by=Helm
chart=jupyterhub-3.1.0
component=hub
heritage=Helm
release=binderhub
Annotations: meta.helm.sh/release-name: binderhub
meta.helm.sh/release-namespace: gesis
Spec:
PodSelector: app=jupyterhub,component=hub,release=binderhub
Allowing ingress traffic:
To Port: http/TCP
From:
PodSelector: hub.jupyter.org/network-access-hub=true
Allowing egress traffic:
To Port: 8001/TCP
To:
PodSelector: app=jupyterhub,component=proxy,release=binderhub
----------
To Port: 8888/TCP
To:
PodSelector: app=jupyterhub,component=singleuser-server,release=binderhub
----------
To Port: 53/UDP
To Port: 53/TCP
To:
IPBlock:
CIDR: 169.254.169.254/32
Except:
To:
NamespaceSelector: kubernetes.io/metadata.name=kube-system
To:
IPBlock:
CIDR: 10.0.0.0/8
Except:
To:
IPBlock:
CIDR: 172.16.0.0/12
Except:
To:
IPBlock:
CIDR: 192.168.0.0/16
Except:
----------
To Port: <any> (traffic allowed to all ports)
To:
IPBlock:
CIDR: 0.0.0.0/0
Except: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254/32
----------
To Port: <any> (traffic allowed to all ports)
To:
IPBlock:
CIDR: 10.0.0.0/8
Except:
To:
IPBlock:
CIDR: 172.16.0.0/12
Except:
To:
IPBlock:
CIDR: 192.168.0.0/16
Except:
----------
To Port: <any> (traffic allowed to all ports)
To:
IPBlock:
CIDR: 169.254.169.254/32
Except:
Policy Types: Ingress, Egress
proxy
Network PolicyName: proxy
Namespace: gesis
Created on: 2023-10-27 12:01:12 +0200 CEST
Labels: app=jupyterhub
app.kubernetes.io/managed-by=Helm
chart=jupyterhub-3.1.0
component=proxy
heritage=Helm
release=binderhub
Annotations: meta.helm.sh/release-name: binderhub
meta.helm.sh/release-namespace: gesis
Spec:
PodSelector: app=jupyterhub,component=proxy,release=binderhub
Allowing ingress traffic:
To Port: http/TCP
To Port: https/TCP
From: <any> (traffic not restricted by source)
----------
To Port: http/TCP
From:
PodSelector: hub.jupyter.org/network-access-proxy-http=true
----------
To Port: api/TCP
From:
PodSelector: hub.jupyter.org/network-access-proxy-api=true
Allowing egress traffic:
To Port: 8081/TCP
To:
PodSelector: app=jupyterhub,component=hub,release=binderhub
----------
To Port: 8888/TCP
To:
PodSelector: app=jupyterhub,component=singleuser-server,release=binderhub
----------
To Port: 53/UDP
To Port: 53/TCP
To:
IPBlock:
CIDR: 169.254.169.254/32
Except:
To:
NamespaceSelector: kubernetes.io/metadata.name=kube-system
To:
IPBlock:
CIDR: 10.0.0.0/8
Except:
To:
IPBlock:
CIDR: 172.16.0.0/12
Except:
To:
IPBlock:
CIDR: 192.168.0.0/16
Except:
----------
To Port: <any> (traffic allowed to all ports)
To:
IPBlock:
CIDR: 0.0.0.0/0
Except: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254/32
----------
To Port: <any> (traffic allowed to all ports)
To:
IPBlock:
CIDR: 10.0.0.0/8
Except:
To:
IPBlock:
CIDR: 172.16.0.0/12
Except:
To:
IPBlock:
CIDR: 192.168.0.0/16
Except:
----------
To Port: <any> (traffic allowed to all ports)
To:
IPBlock:
CIDR: 169.254.169.254/32
Except:
Policy Types: Ingress, Egress
singleuser
Network PolicyName: singleuser
Namespace: gesis
Created on: 2023-10-27 12:01:12 +0200 CEST
Labels: app=jupyterhub
app.kubernetes.io/managed-by=Helm
chart=jupyterhub-3.1.0
component=singleuser
heritage=Helm
release=binderhub
Annotations: meta.helm.sh/release-name: binderhub
meta.helm.sh/release-namespace: gesis
Spec:
PodSelector: app=jupyterhub,component=singleuser-server,release=binderhub
Allowing ingress traffic:
To Port: notebook-port/TCP
From:
PodSelector: hub.jupyter.org/network-access-singleuser=true
Allowing egress traffic:
To Port: 8081/TCP
To:
PodSelector: app=jupyterhub,component=hub,release=binderhub
----------
To Port: 8000/TCP
To:
PodSelector: app=jupyterhub,component=proxy,release=binderhub
----------
To Port: 8080/TCP
To Port: 8443/TCP
To:
PodSelector: app=jupyterhub,component=autohttps,release=binderhub
----------
To Port: 53/UDP
To Port: 53/TCP
To:
IPBlock:
CIDR: 169.254.169.254/32
Except:
To:
NamespaceSelector: kubernetes.io/metadata.name=kube-system
To:
IPBlock:
CIDR: 10.0.0.0/8
Except:
To:
IPBlock:
CIDR: 172.16.0.0/12
Except:
To:
IPBlock:
CIDR: 192.168.0.0/16
Except:
----------
To Port: <any> (traffic allowed to all ports)
To:
IPBlock:
CIDR: 0.0.0.0/0
Except: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.169.254/32
Policy Types: Ingress, Egress
Is it possible the NetworkPolicy controller doesn't quite implement policies in the way it's meant to? In https://github.com/jupyterhub/mybinder.org-deploy/pull/2698 I had a lot of problems with the AWS network policy controller, so I ended up overriding the policies after a lot of trial and error. See the networkPolicy.ingress
sections in
https://github.com/jupyterhub/mybinder.org-deploy/pull/2698/files#diff-a545d6fc3dead92078cac561cb659146ca961dbc81b295dbec0e2762232cb06d
One method I found useful for debugging was to create a pod.yaml for an image like netshoot, but copying the annotations and labels from one of the Jupyter pods. If you deploy this pod the annotations/labels means it should have the same Network Policy restrictions as the Jupyter pod in question, and so you can then kubectl exec ....
to interactively poke around the network from the pod, and kubectl edit
the pod labels/annotations and the network policies to figure out where the block is occurring.
E.g.
# kubectl apply -f pod.yaml
# kubectl exec -it host-shell -- bash
---
apiVersion: v1
kind: Pod
metadata:
name: host-shell
labels:
app: jupyterhub
component: hub
hub.jupyter.org/network-access-proxy-api: "true"
hub.jupyter.org/network-access-proxy-http: "true"
hub.jupyter.org/network-access-singleuser: "true"
release: curvenote
spec:
# Uncomment if you need to connect to a specific node
# nodeSelector:
# kubernetes.io/hostname: nodename.k8s.example.org
containers:
- name: host-shell
command:
- sleep
args:
- 1h
image: docker.io/nicolaka/netshoot:v0.11
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
restartPolicy: Never
tolerations:
- effect: NoSchedule
key: hub.jupyter.org/dedicated
operator: Equal
value: user
- effect: NoSchedule
key: hub.jupyter.org_dedicated
operator: Equal
value: user
Thanks @manics. I follow your suggestion for debug.
GESIS node configuration is deployed using GitLab CI (similar to GitHub Actions). The core steps are
GESIS node is running Kubernetes with Calico as Container Network Interface (CNI) plugin.
The Helm Chart loads
I think that I'm missing an important step here. Any help?