Closed franklin-degirum closed 2 years ago
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:
PVCs are not supposed to be deleted, and I don't think they are when using this Helm chart to deploy JupyterHub with KubeSpawner behind the scenes, but if that happens its a serious bug. Can you describe further how you have configured a deployment of JupyterHub where you experience this?
I pretty much use the default configuration given in the z2jh documentation except for the Auth0 and LetsEncrypt. I also am using the culling service. when I logout, I'm able to get access to my PVC storage when I login again but when the idle-cull service shuts it down or I manually shut the server down, and try starting the server again its a new empty volume with the same volume name.
Can you show us your full configuration, and tell us the version of Z2JH? PVCs may be deleted for named servers, or if the user is deleted (which may be done by the idle-culler, that's why it would be helpful to see your config).
This is my helm version - version.BuildInfo{Version:"v3.5.0", GitCommit:"32c22239423b3b4ba6706d450bd044baffdcf9e6", GitTreeState:"clean", GoVersion:"go1.15.6"} below is my full configuration except the part for auth0 and letsencrypt:
# fullnameOverride and nameOverride distinguishes blank strings, null values,
# and non-blank strings. For more details, see the configuration reference.
fullnameOverride: ""
nameOverride:
# custom can contain anything you want to pass to the hub pod, as all passed
# Helm template values will be made available there.
custom: {}
# imagePullSecret is configuration to create a k8s Secret that Helm chart's pods
# can get credentials from to pull their images.
imagePullSecret:
create: false
automaticReferenceInjection: true
registry:
username:
password:
email:
# imagePullSecrets is configuration to reference the k8s Secret resources the
# Helm chart's pods can get credentials from to pull their images.
imagePullSecrets: []
# hub relates to the hub pod, responsible for running JupyterHub, its configured
# Authenticator class KubeSpawner, and its configured Proxy class
# ConfigurableHTTPProxy. KubeSpawner creates the user pods, and
# ConfigurableHTTPProxy speaks with the actual ConfigurableHTTPProxy server in
# the proxy pod.
hub:
config:
JupyterHub:
admin_access: true
authenticator_class: dummy
service:
type: ClusterIP
annotations: {}
ports:
nodePort:
extraPorts: []
loadBalancerIP:
baseUrl: /
cookieSecret:
initContainers: []
fsGid: 1000
nodeSelector: {}
tolerations: []
concurrentSpawnLimit: 64
consecutiveFailureLimit: 5
activeServerLimit:
deploymentStrategy:
## type: Recreate
## - sqlite-pvc backed hubs require the Recreate deployment strategy as a
## typical PVC storage can only be bound to one pod at the time.
## - JupyterHub isn't designed to support being run in parallell. More work
## needs to be done in JupyterHub itself for a fully highly available (HA)
## deployment of JupyterHub on k8s is to be possible.
type: Recreate
db:
type: sqlite-pvc
upgrade:
pvc:
annotations: {}
selector: {}
accessModes:
- ReadWriteOnce
storage: 1Gi
subPath:
storageClassName:
url:
password:
labels: {}
annotations: {}
command: []
args: []
extraConfig: {}
extraFiles: {}
extraEnv: {}
extraContainers: []
extraVolumes: []
extraVolumeMounts: []
image:
name: jupyterhub/k8s-hub
tag: "1.2.0"
pullPolicy:
pullSecrets: []
resources: {}
containerSecurityContext:
runAsUser: 1000
runAsGroup: 1000
allowPrivilegeEscalation: false
lifecycle: {}
services: {}
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
networkPolicy:
enabled: true
ingress: []
## egress for JupyterHub already includes Kubernetes internal DNS and
## access to the proxy, but can be restricted further, but ensure to allow
## access to the Kubernetes API server that couldn't be pinned ahead of
## time.
##
## ref: https://stackoverflow.com/a/59016417/2220152
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
allowNamedServers: false
namedServerLimitPerUser:
authenticatePrometheus:
redirectToServer:
shutdownOnLogout:
templatePaths: []
templateVars: {}
livenessProbe:
# The livenessProbe's aim to give JupyterHub sufficient time to startup but
# be able to restart if it becomes unresponsive for ~5 min.
enabled: true
initialDelaySeconds: 300
periodSeconds: 10
failureThreshold: 30
timeoutSeconds: 3
readinessProbe:
# The readinessProbe's aim is to provide a successful startup indication,
# but following that never become unready before its livenessProbe fail and
# restarts it if needed. To become unready following startup serves no
# purpose as there are no other pod to fallback to in our non-HA deployment.
enabled: true
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 1000
timeoutSeconds: 1
existingSecret:
serviceAccount:
annotations: {}
extraPodSpec: {}
rbac:
enabled: true
# proxy relates to the proxy pod, the proxy-public service, and the autohttps
# pod and proxy-http service.
proxy:
secretToken:
annotations: {}
deploymentStrategy:
## type: Recreate
## - JupyterHub's interaction with the CHP proxy becomes a lot more robust
## with this configuration. To understand this, consider that JupyterHub
## during startup will interact a lot with the k8s service to reach a
## ready proxy pod. If the hub pod during a helm upgrade is restarting
## directly while the proxy pod is making a rolling upgrade, the hub pod
## could end up running a sequence of interactions with the old proxy pod
## and finishing up the sequence of interactions with the new proxy pod.
## As CHP proxy pods carry individual state this is very error prone. One
## outcome when not using Recreate as a strategy has been that user pods
## have been deleted by the hub pod because it considered them unreachable
## as it only configured the old proxy pod but not the new before trying
## to reach them.
type: Recreate
## rollingUpdate:
## - WARNING:
## This is required to be set explicitly blank! Without it being
## explicitly blank, k8s will let eventual old values under rollingUpdate
## remain and then the Deployment becomes invalid and a helm upgrade would
## fail with an error like this:
##
## UPGRADE FAILED
## Error: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
## Error: UPGRADE FAILED: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
rollingUpdate:
# service relates to the proxy-public service
service:
type: LoadBalancer
labels: {}
annotations: {}
nodePorts:
http:
https:
disableHttpPort: false
extraPorts: []
loadBalancerIP: <here goes my load balancer IP>
loadBalancerSourceRanges: []
# chp relates to the proxy pod, which is responsible for routing traffic based
# on dynamic configuration sent from JupyterHub to CHP's REST API.
chp:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: jupyterhub/configurable-http-proxy
tag: 4.5.0 # https://github.com/jupyterhub/configurable-http-proxy/releases
pullPolicy:
pullSecrets: []
extraCommandLineFlags: []
livenessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
enabled: true
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 1000
resources: {}
defaultTarget:
errorTarget:
extraEnv: {}
nodeSelector: {}
tolerations: []
networkPolicy:
enabled: true
ingress: []
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
interNamespaceAccessLabels: ignore
allowedIngressPorts: [http, https]
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
extraPodSpec: {}
# traefik relates to the autohttps pod, which is responsible for TLS
# termination when proxy.https.type=letsencrypt.
traefik:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: traefik
tag: 2.6.0 # ref: https://hub.docker.com/_/traefik?tab=tags
pullPolicy:
pullSecrets: []
hsts:
includeSubdomains: false
preload: false
maxAge: 15724800 # About 6 months
resources: {}
labels: {}
extraEnv: {}
extraVolumes: []
extraVolumeMounts: []
extraStaticConfig: {}
extraDynamicConfig: {}
nodeSelector: {}
tolerations: []
extraPorts: []
networkPolicy:
enabled: true
ingress: []
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
interNamespaceAccessLabels: ignore
allowedIngressPorts: [http, https]
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
serviceAccount:
annotations: {}
extraPodSpec: {}
secretSync:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: jupyterhub/k8s-secret-sync
tag: "1.2.0"
pullPolicy:
pullSecrets: []
resources: {}
labels: {}
https:
enabled: false
type: letsencrypt
#type: letsencrypt, manual, offload, secret
letsencrypt:
contactEmail:
# Specify custom server here (https://acme-staging-v02.api.letsencrypt.org/directory) to hit staging LE
acmeServer: https://acme-v02.api.letsencrypt.org/directory
manual:
key:
cert:
secret:
name:
key: tls.key
crt: tls.crt
hosts: []
# singleuser relates to the configuration of KubeSpawner which runs in the hub
# pod, and its spawning of user pods such as jupyter-myusername.
singleuser:
podNameTemplate:
extraTolerations: []
nodeSelector: {}
extraNodeAffinity:
required: []
preferred: []
extraPodAffinity:
required: []
preferred: []
extraPodAntiAffinity:
required: []
preferred: []
networkTools:
image:
name: jupyterhub/k8s-network-tools
tag: "1.2.0"
pullPolicy:
pullSecrets: []
cloudMetadata:
# block set to true will append a privileged initContainer using the
# iptables to block the sensitive metadata server at the provided ip.
blockWithIptables: true
ip: 169.254.169.254
networkPolicy:
enabled: true
ingress: []
egress:
# Required egress to communicate with the hub and DNS servers will be
# augmented to these egress rules.
#
# This default rule explicitly allows all outbound traffic from singleuser
# pods, except to a typical IP used to return metadata that can be used by
# someone with malicious intent.
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
events: true
extraAnnotations: {}
extraLabels:
hub.jupyter.org/network-access-hub: "true"
extraFiles: {}
extraEnv: {}
lifecycleHooks:
postStart:
exec:
command:
- "sh"
- "-c"
- >
<a folder copy to /srv/jupyterhub command goes here>
initContainers: []
extraContainers: []
uid: 1000
fsGid: 100
serviceAccountName:
storage:
type: dynamic
extraLabels: {}
extraVolumes: []
extraVolumeMounts: []
static:
pvcName:
subPath: "{username}"
capacity: 1Gi
homeMountPath: /srv/jupyterhub
dynamic:
storageClass:
pvcNameTemplate: claim-{username}{servername}
volumeNameTemplate: volume-{username}{servername}
storageAccessModes: [ReadWriteOnce]
image:
name: <my modified docker image>
tag: "latest"
pullPolicy:
pullSecrets: []
startTimeout: 300
cpu:
limit:
guarantee:
memory:
limit:
guarantee:
extraResource:
limits: {}
guarantees: {}
cmd: jupyterhub-singleuser
defaultUrl: /lab/tree/<my landing file>.ipynb
extraPodConfig: {}
profileList: []
# scheduling relates to the user-scheduler pods and user-placeholder pods.
scheduling:
userScheduler:
enabled: true
replicas: 2
logLevel: 4
# plugins ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins-1
plugins:
score:
disabled:
- name: SelectorSpread
- name: TaintToleration
- name: PodTopologySpread
- name: NodeResourcesBalancedAllocation
- name: NodeResourcesLeastAllocated
# Disable plugins to be allowed to enable them again with a different
# weight and avoid an error.
- name: NodePreferAvoidPods
- name: NodeAffinity
- name: InterPodAffinity
- name: ImageLocality
enabled:
- name: NodePreferAvoidPods
weight: 161051
- name: NodeAffinity
weight: 14631
- name: InterPodAffinity
weight: 1331
- name: NodeResourcesMostAllocated
weight: 121
- name: ImageLocality
weight: 11
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
# IMPORTANT: Bumping the minor version of this binary should go hand in
# hand with an inspection of the user-scheduelrs RBAC resources
# that we have forked.
name: k8s.gcr.io/kube-scheduler
tag: v1.19.13 # ref: https://github.com/kubernetes/website/blob/main/content/en/releases/patch-releases.md
pullPolicy:
pullSecrets: []
nodeSelector: {}
tolerations: []
pdb:
enabled: true
maxUnavailable: 1
minAvailable:
resources: {}
serviceAccount:
annotations: {}
extraPodSpec: {}
podPriority:
enabled: false
globalDefault: false
defaultPriority: 0
userPlaceholderPriority: -10
userPlaceholder:
enabled: true
image:
name: k8s.gcr.io/pause
# tag's can be updated by inspecting the output of the command:
# gcloud container images list-tags k8s.gcr.io/pause --sort-by=~tags
#
# If you update this, also update prePuller.pause.image.tag
tag: "3.5"
pullPolicy:
pullSecrets: []
replicas: 0
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
resources: {}
corePods:
tolerations:
- key: hub.jupyter.org/dedicated
operator: Equal
value: core
effect: NoSchedule
- key: hub.jupyter.org_dedicated
operator: Equal
value: core
effect: NoSchedule
nodeAffinity:
matchNodePurpose: prefer
userPods:
tolerations:
- key: hub.jupyter.org/dedicated
operator: Equal
value: user
effect: NoSchedule
- key: hub.jupyter.org_dedicated
operator: Equal
value: user
effect: NoSchedule
nodeAffinity:
matchNodePurpose: prefer
# prePuller relates to the hook|continuous-image-puller DaemonsSets
prePuller:
annotations: {}
resources: {}
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
extraTolerations: []
# hook relates to the hook-image-awaiter Job and hook-image-puller DaemonSet
hook:
enabled: true
pullOnlyOnChanges: true
# image and the configuration below relates to the hook-image-awaiter Job
image:
name: jupyterhub/k8s-image-awaiter
tag: "1.2.0"
pullPolicy:
pullSecrets: []
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
podSchedulingWaitDuration: 10
nodeSelector: {}
tolerations: []
resources: {}
serviceAccount:
annotations: {}
continuous:
enabled: true
pullProfileListImages: true
extraImages: {}
pause:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: k8s.gcr.io/pause
# tag's can be updated by inspecting the output of the command:
# gcloud container images list-tags k8s.gcr.io/pause --sort-by=~tags
#
# If you update this, also update scheduling.userPlaceholder.image.tag
tag: "3.5"
pullPolicy:
pullSecrets: []
ingress:
enabled: false
annotations: {}
hosts: []
pathSuffix:
pathType: Prefix
tls: []
# cull relates to the jupyterhub-idle-culler service, responsible for evicting
# inactive singleuser pods.
#
# The configuration below, except for enabled, corresponds to command-line flags
# for jupyterhub-idle-culler as documented here:
# https://github.com/jupyterhub/jupyterhub-idle-culler#as-a-standalone-script
#
cull:
enabled: true
users: false # --cull-users
removeNamedServers: false # --remove-named-servers
timeout: 3600 # --timeout
every: 1200 # --cull-every
concurrency: 10 # --concurrency
maxAge: 0 # --max-age
debug:
enabled: true
global:
safeToShowValues: false
If you'd like to know the exact jupyterhub image I am pulling, you can find and pull it from dockerhub - franklinmoses/dghub
If you configure this like this:
hub:
config:
KubeSpawner:
delete_pvc: false
And that resolves the issue, then for some reason, either your users are deleted by the jupyterhub-idle-culler, or the JupyterHub Spawner class is incorrectly calling a cleanup function called delete_forever
that is associated with deleting users and their associated storage etc.
By the way:
helm upgrade --values <your config file.yaml>
. If you have a copy of all the previous versions default values, and they change as you upgrade to a new version - you could run into issues. It also makes it very hard for me to guess whats going on, as I cant review specifically what you have changed as all configuration is listed there.I figure this is the reason you have issues. It seems like you mount the storage to a location, and then you have a command to copy to that location from somewhere.
singleuser:
lifecycleHooks:
postStart:
exec:
command:
- "sh"
- "-c"
- >
<a folder copy to /srv/jupyterhub command goes here>
storage:
homeMountPath: /srv/jupyterhub
I'll go for a close on this issue with that.
How can I retain the user storage every time the server is culled and when they login again they get to access their previous runtime files. I tried looking at the jupyterhub_idle_culler.py but not really much options just to delete the server and not the storage. Would like to know if there is an existing feature or any workaround for it.