Closed georghildebrand closed 4 years ago
There have been some changes around this area in the Notebook that might just now being propagated around Hub and causing this, but I will have to test it further to see if it's really a side effect of that. I will update here with our findings.
@lresende thanks for having a look.
This is my JEG deployment.yaml and some notes:
# This file defines the Kubernetes objects necessary for Enterprise Gateway to run within Kubernetes.
#
apiVersion: v1
kind: Namespace
metadata:
name: enterprise-gateway
labels:
app: enterprise-gateway
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: enterprise-gateway-sa
namespace: enterprise-gateway
labels:
app: enterprise-gateway
component: enterprise-gateway
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: enterprise-gateway-controller
labels:
app: enterprise-gateway
component: enterprise-gateway
rules:
- apiGroups: [""]
resources: ["pods", "namespaces", "services", "configmaps", "secrets", "persistentvolumes", "persistentvolumeclaims"]
verbs: ["get", "watch", "list", "create", "delete"]
- apiGroups: ["rbac.authorization.k8s.io"]
resources: ["rolebindings"]
verbs: ["get", "list", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
# Referenced by EG_KERNEL_CLUSTER_ROLE below
name: kernel-controller
labels:
app: enterprise-gateway
component: kernel
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list", "create", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: enterprise-gateway-controller
labels:
app: enterprise-gateway
component: enterprise-gateway
subjects:
- kind: ServiceAccount
name: enterprise-gateway-sa
namespace: enterprise-gateway
roleRef:
kind: ClusterRole
name: enterprise-gateway-controller
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Service
metadata:
labels:
app: enterprise-gateway
component: enterprise-gateway
name: enterprise-gateway
namespace: enterprise-gateway
spec:
ports:
- name: gateway-port
port: 8888
targetPort: 8888
selector:
gateway-selector: enterprise-gateway
sessionAffinity: ClientIP
type: NodePort
# Uncomment in order to use <k8s-master>:8888
# externalIPs:
# - k8s-master-public-ip
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: enterprise-gateway
namespace: enterprise-gateway
labels:
gateway-selector: enterprise-gateway
app: enterprise-gateway
component: enterprise-gateway
spec:
# Uncomment/Update to deploy multiple replicas of EG
# replicas: 1
selector:
matchLabels:
gateway-selector: enterprise-gateway
template:
metadata:
labels:
gateway-selector: enterprise-gateway
app: enterprise-gateway
component: enterprise-gateway
spec:
# Created above.
serviceAccountName: enterprise-gateway-sa
containers:
- env:
- name: EG_PORT
value: "8888"
# Created above.
- name: EG_NAMESPACE
value: "enterprise-gateway"
# Created above. Used if no KERNEL_NAMESPACE is provided by client.
- name: EG_KERNEL_CLUSTER_ROLE
value: "kernel-controller"
# All kernels reside in the EG namespace if True, otherwise KERNEL_NAMESPACE
# must be provided or one will be created for each kernel.
- name: EG_SHARED_NAMESPACE
value: "False"
# NOTE: This requires appropriate volume mounts to make notebook dir accessible
- name: EG_MIRROR_WORKING_DIRS
value: "False"
# Current idle timeout is 1 hour.
- name: EG_CULL_IDLE_TIMEOUT
value: "3600"
- name: EG_LOG_LEVEL
value: "DEBUG"
- name: EG_KERNEL_LAUNCH_TIMEOUT
value: "60"
- name: EG_KERNEL_WHITELIST
value: "['r_kubernetes','python_kubernetes','python_tf_kubernetes','python_tf_gpu_kubernetes','scala_kubernetes','spark_r_kubernetes','spark_python_kubernetes','spark_scala_kubernetes']"
# Ensure the following VERSION tag is updated to the version of Enterprise Gateway you wish to run
image: elyra/enterprise-gateway:dev
# Use IfNotPresent policy so that dev-based systems don't automatically
# update. This provides more control. Since formal tags will be release-specific
# this policy should be sufficient for them as well.
imagePullPolicy: IfNotPresent
name: enterprise-gateway
resources:
requests:
cpu: "2000m"
memory: "4Gi"
limits:
cpu: "2000m"
memory: "4Gi"
ports:
- containerPort: 8888
name: gateway-port
protocol: TCP
## Uncomment to enable NFS-mounted kernelspecs
# volumeMounts:
# - name: kernelspecs
# mountPath: "/usr/local/share/jupyter/kernels"
# volumes:
# - name: kernelspecs
# nfs:
# server: <internal-ip-of-nfs-server>
# path: "/usr/local/share/jupyter/kernels"
---
# apiVersion: apps/v1
# kind: DaemonSet
# metadata:
# name: kernel-image-puller
# namespace: enterprise-gateway
# spec:
# selector:
# matchLabels:
# name: kernel-image-puller
# template:
# metadata:
# labels:
# name: kernel-image-puller
# app: enterprise-gateway
# component: kernel-image-puller
# spec:
# containers:
# - name: kernel-image-puller
# image: elyra/kernel-image-puller:dev
# env:
# - name: KIP_GATEWAY_HOST
# value: "http://enterprise-gateway.enterprise-gateway:8888"
# - name: KIP_INTERVAL
# value: "300"
# - name: KIP_PULL_POLICY
# value: "IfNotPresent"
# volumeMounts:
# - name: dockersock
# mountPath: "/var/run/docker.sock"
# volumes:
# - name: dockersock
# hostPath:
# path: /var/run/docker.sock
This issue can be closed, i realized that i had to use different env vars for connecting to the kernel. However, i don't know why it was trying to use tokens ...
Thanks for working through this @georghildebrand. I wanted to respond to the EG_RESPONSE_IP
question.
EG_RESPONSE_IP
only applies to the interactions between EG and the launched kernel pod. Notebook doesn't come into play here. This environment variable is set prior to starting EG in cases where EG and the cluster that its launching kernels against has some kind of firewall or the specific local IP is not appropriate when used from the cluster on which the kernel lands. It is rarely used.
This value is used when constructing the EG_RESPONSE_ADDRESS
environment variable. The EG_RESPONSE_IP
is prepended to a port that EG listens on immediately following the kernel's launch. If EG_RESPONSE_IP
is None, the EG server's local IP is used. The EG_RESPONSE_ADDRESS
is conveyed to the launched kernel via the environment for containerized kernel launches, or as an argument to the kernel launcher for non-containerized launches. Its this response address to which the launched kernel sends its ZMQ port information, etc. EG then "connects" its kernel manager to these returned ports and steps out of the way, letting EG serve as a proxy between Notebook and the remote kernel.
I hope that helps.
@kevin-bates thanks for clarification! much appreciated.
Hi @georghildebrand! - How did you solved it? Sometimes I get the same issue
@lucabem I used mainly the above mentioned env vars. I think as the lib uses k8s client if these are not present correctly it tries token based auth or so. Sadly I'm on mobile only otherwise I would post my Manifest that worked out
Description
thank you all for this amazing work. I took some time today to try it out on a test kubernetes cluster (not supporting helm).
I basically went though the k8s docs and created a deployment (all fine so far). The enterprise gatewaypod is now running. in my jupyterhub notebook server i created the kernel.json and script folder. The new kernel shows up in my notebooks server. When i started i get the following error:
I thought the notebooks server does not need operator permission or so?? For sure i am mixing up something. Any hint welcome.
Environment