neurolibre / neurolibre-binderhub

developer resources for neurolibre.conp.ca
2 stars 4 forks source link

HTTP timeout with repo2data #19

Open ltetrel opened 5 years ago

ltetrel commented 5 years ago

When data takes to long to download, we got a http timout from the hub pod. We can of course temporally increase the timeout, but this has the advantage to impact restarting time if there is another thing wrong with the environment.

ltetrel commented 5 years ago

http timeout is controlled here : https://jupyterhub.readthedocs.io/en/stable/api/spawner.html#spawner

ltetrel commented 5 years ago

Using an init container could help to encapsulate repo2data into a separate process (container). https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ Now it is running in the hub container, which make the data download part of the process to run a pod (not ideal).

ltetrel commented 5 years ago

Maybe it is possible to run an init container for every pod that are building ? component=binderhub-build Maybe with pod preset ? This way repo2data would be called just once, when the pod is building (and not in the hub like now, repo2data is called every time a pod is created). + no http timeout since it is running during building

ltetrel commented 4 years ago

The issue is that the pod preset should have the information of the repo being built (to clone the data_requirement file and pull the data with repo2data)

ltetrel commented 4 years ago

Hopefully the pod used to build the user binder environment has the repo in his annotation, which could be then used as an input for the pod preset (but how?):

Labels:       component=binderhub-build
              name=build-ltetrel-2dbinder-2dtuto-f9e17d-dc5e69
Annotations:  binder-repo: https://github.com/ltetrel/binder-tuto
agahkarakuzu commented 4 years ago

There is a way to fetch values from a container to store them as env vars. If we store label/name with the information of the repo being built in the same order, it may do the trick maybe?

ltetrel commented 4 years ago

Yes it could work

ltetrel commented 4 years ago

The issue is how to inject a container (which would run repo2data) to every build pod. If init-container, we would need to specify it when creating the build pod (what we don't have controll on). That is why I fought about podPreset to insert an init container before every build.

ltetrel commented 4 years ago

But podpreset does not seem to be used for this type of case: https://github.com/kubernetes/kubernetes/issues/43874 Will check https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/..

ltetrel commented 4 years ago

If we would have controll over binderhub code, we could add it here, maybe it is a room to create a PR (that would fit a general case for them)..

ltetrel commented 4 years ago

Or maybe it should be possible to change the build_image https://github.com/jupyterhub/binderhub/blob/b6446b12b30f741d9e82b7aec1498ede4776cd79/binderhub/app.py#L383

agahkarakuzu commented 4 years ago

OK, I thought you were referring to injecting application information into a pod. But you would like to inject a (repo2data) container to a running (build) pod or on pod creation?

ltetrel commented 4 years ago

Yep, and for that I need the repository information to pull the data_requirement file inside repo2data docker container.

ltetrel commented 4 years ago

Example of rendered config for a binder build pod:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    binder-repo: https://github.com/ltetrel/repo2data-caching-s3
  creationTimestamp: "2020-03-13T16:38:57Z"
  labels:
    component: binderhub-build
    name: build-ltetrel-2drepo2data-2dcaching-2ds3-7c151e-a3305a
  name: build-ltetrel-2drepo2data-2dcaching-2ds3-7c151e-a3305a
  namespace: binderhub
  resourceVersion: "1029140"
  selfLink: /api/v1/namespaces/binderhub/pods/build-ltetrel-2drepo2data-2dcaching-2ds3-7c151e-a3305a
  uid: 55d4b73d-ba25-4af5-961a-b13d9d36f95b
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              component: binderhub-build
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - args:
    - jupyter-repo2docker
    - --ref
    - a3305a93929c977f9d83e77e05cbb6a370d0284b
    - --image
    - binder-registry.conp.cloud/binder-dev.conp.cloud/binder-ltetrel-2drepo2data-2dcaching-2ds3-7c151e:a3305a93929c977f9d83e77e05cbb6a370d0284b
    - --no-clean
    - --no-run
    - --json-logs
    - --user-name
    - jovyan
    - --user-id
    - "1000"
    - --push
    - https://github.com/ltetrel/repo2data-caching-s3
    image: jupyter/repo2docker:0.10.0
    imagePullPolicy: IfNotPresent
    name: builder
    resources:
      limits:
        memory: "0"
      requests:
        memory: "0"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/docker.sock
      name: docker-socket
    - mountPath: /root/.docker
      name: docker-push-secret
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-5gnt9
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: neurolibre-dev-node1
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: hub.jupyter.org/dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - hostPath:
      path: /var/run/docker.sock
      type: Socket
    name: docker-socket
  - name: docker-push-secret
    secret:
      defaultMode: 420
      secretName: binder-push-secret
  - name: default-token-5gnt9
    secret:
      defaultMode: 420
      secretName: default-token-5gnt9
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:38:57Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:39:01Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:39:01Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:38:57Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://5f42c13b1b833ac645bfc648293a057cbfb51bb3cab21e7f2e149943442bf0db
    image: jupyter/repo2docker:0.10.0
    imageID: docker-pullable://jupyter/repo2docker@sha256:b8855ce9f6f9ba3a98369331231f6c0d01badec68109f4b13b2308f5d15698f4
    lastState: {}
    name: builder
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-03-13T16:39:00Z"
  hostIP: 192.168.73.23
  phase: Running
  podIP: 10.244.1.12
  podIPs:
  - ip: 10.244.1.12
  qosClass: BestEffort
  startTime: "2020-03-13T16:38:57Z"
ltetrel commented 4 years ago

https://github.com/jupyterhub/binderhub/pull/1081