pangeo-data / helm-chart

Pangeo helm charts
https://pangeo-data.github.io/helm-chart/
21 stars 26 forks source link

Pangeo failing all Helm versions > v0.1.1-e5fa7c4 #93

Closed Boes-man closed 5 years ago

Boes-man commented 5 years ago

Hello, I installed Pangeo 0.1.1-86665a6 via the cloud deploy process successfully. I have be testing 4_upgrade_helm.sh which works up to v0.1.1-e5fa7c4. Any version after this fails to deploy Pangeo. Initially versions complete the upgrade successfully but launching the server pod fails with "cant find singleuser" script. Latter versions fail to deploy completely as the hub pod can not start with a similar error about an OS env for singeluser.

Thanks

server/user pod error: Error: failed to start container "notebook": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"start-singleuser.sh\": executable file not found in $PATH": unknown Back-off restarting failed container

HUB Error: [E 2019-05-16 07:08:55.884 JupyterHub app:1623] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 1620, in launch_instance_async yield self.initialize(argv) File "/usr/lib/python3.6/types.py", line 204, in __next__ return next(self.__wrapped) File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 1358, in initialize self.load_config_file(self.config_file) File "<decorator-gen-5>", line 2, in load_config_file File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 87, in catch_config_error return method(app, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 598, in load_config_file raise_config_file_errors=self.raise_config_file_errors, File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 562, in _load_config_files config = loader.load_config() File "/usr/local/lib/python3.6/dist-packages/traitlets/config/loader.py", line 457, in load_config self._read_file_as_dict() File "/usr/local/lib/python3.6/dist-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict py3compat.execfile(conf_filename, namespace) File "/usr/local/lib/python3.6/dist-packages/ipython_genutils/py3compat.py", line 198, in execfile exec(compiler(f.read(), fname, 'exec'), glob, loc) File "/srv/jupyterhub_config.py", line 46, in <module> c.KubeSpawner.singleuser_image_spec = os.environ['SINGLEUSER_IMAGE'] File "/usr/lib/python3.6/os.py", line 669, in __getitem__ raise KeyError(key) from None KeyError: 'SINGLEUSER_IMAGE'

arokem commented 5 years ago

I definitely also saw this along the way in one of the permutations I went through in #625

Boes-man commented 5 years ago

Current installed version for this helm diff is, 0.1.1-86665a6 Output from helm diff upgrade pangeohub pangeo/pangeo **--version=v0.1.1-c02878a** -f secret_config.yaml -f jupyter_config.yaml is below

`pangeo, daskkubernetes, RoleBinding (rbac.authorization.k8s.io) has changed:

Source: pangeo/templates/dask-kubernetes-rbac.yaml

kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: daskkubernetes namespace: pangeo labels:

Boes-man commented 5 years ago

Attaching diff outputs in files. helmdiff_0.1.1-86665a6_to_v0.1.1-c02878a.txt helmdiff_0.1.1-86665a6_to_19.03.05.txt

jhamman commented 5 years ago

I've moved this issue to the hem-chart repo. I hope this helps attract attention from a few others that are more familiar with migrating between versions of helm charts.

I can say that we aren't using the start-singleuser.sh script anymore. I seem to remember some explicit references to it in the chart so I'd make sure you've removed all of those.

@jacobtomlinson - does any of this ring a bell for you?

consideRatio commented 5 years ago

From mobile: i noted singleuser_image_spec, which is two deprications behind.

it should be no singleuser_ prefix, and image_spec should be image

guillaumeeb commented 5 years ago

The script start-singleuser.sh should have been removed here : https://github.com/pangeo-data/pangeo/commit/48c93cab1377f28c37cfe58009794b0167fa8a73#diff-820e30fd7bcc5d1dd260d2206452a511.

Could you show your jupyter_config.yaml?

I'm not sure if this issue is on the helm chart or on the setup guide and examples from main Pangeo repo.

Boes-man commented 5 years ago

Thanks all, my jupyter.yaml

file: jupyter_config.yaml

initContainers:
  - name: clone_git
    image: alpine
    command: ['git', 'clone', 'https://github.com/pangeo-data/pangeo-custom-jupyterhub-templates.git','/tmp/data']
    volumeMounts:
    - name: custom-templates
      mountPath: /tmp/data

jupyterhub:
  singleuser:
    cmd: ['start-singleuser.sh']
    extraEnv:
      EXTRA_PIP_PACKAGES: >-
      GCSFUSE_BUCKET: 
    storage:
      extraVolumes:
        - name: fuse
          hostPath:
            path: /dev/fuse
      extraVolumeMounts:
        - name: fuse
          mountPath: /dev/fuse
    cloudMetadata:
      enabled: true
    cpu:
      limit: 4
      guarantee: 1
    memory:
      limit: 14G
      guarantee: 4G

  hub:
    extraConfig:
      customPodHook: |
        from kubernetes import client
        def modify_pod_hook(spawner, pod):
            pod.spec.containers[0].security_context = client.V1SecurityContext(
                privileged=True,
                capabilities=client.V1Capabilities(
                    add=['SYS_ADMIN']
                )
            )
            return pod
        c.KubeSpawner.modify_pod_hook = modify_pod_hook
        c.JupyterHub.logo_file = '/usr/local/share/jupyter/hub/static/custom/images/logo.png'
        c.JupyterHub.template_paths = ['/usr/local/share/jupyter/hub/custom_templates/',
                                      '/usr/local/share/jupyter/hub/templates/']
    image:
      name: jupyterhub/k8s-hub
      tag: v0.6
    extraVolumes:
      - name: custom-templates
        emptyDir: {}
    extraVolumeMounts:
      - mountPath: /usr/local/share/jupyter/hub/custom_templates
        name: custom-templates
        subPath: "pangeo-custom-jupyterhub-templates/templates"
      - mountPath: /usr/local/share/jupyter/hub/static/custom
        name: custom-templates
        subPath: "pangeo-custom-jupyterhub-templates/assets"

  cull:
    enabled: true
    users: false
    timeout: 1200
    every: 600

  # this section specifies the IP address for pangeo
  proxy:
    service:
      loadBalancerIP:  
guillaumeeb commented 5 years ago

Your template still uses cmd: ['start-singleuser.sh'] which is not valid anymore. It has been removed from the example by the PR I mentioned above.

But I see there is still some measleading files in the upper directory https://github.com/pangeo-data/pangeo/tree/master/gce. I think these should be removed.

consideRatio commented 5 years ago

Perhaps unrelated but important i think:

tagging another hub image is an issue i think, they should go with the z2jh helm chart.

From mobile, can explain why later, but it relates to bundled config

Boes-man commented 5 years ago

thanks @guillaumeeb , i when from 0.1.1-86665a6 to v0.1.1-c02878a successfully. I use the "dask-array.ipynb" example to check things on an "application" level, but its failing with the below

`from dask_kubernetes import KubeCluster cluster = KubeCluster(n_workers=10) cluster

ValueError Traceback (most recent call last)

in 1 from dask_kubernetes import KubeCluster ----> 2 cluster = KubeCluster(n_workers=10) 3 cluster /srv/conda/lib/python3.6/site-packages/dask_kubernetes/core.py in __init__(self, pod_template, name, namespace, n_workers, host, port, env, **kwargs) 178 msg = ("Worker pod specification not provided. See KubeCluster " 179 "docstring for ways to specify workers") --> 180 raise ValueError(msg) 181 182 self.cluster = LocalCluster(ip=host or socket.gethostname(), ValueError: Worker pod specification not provided. See KubeCluster docstring for ways to specify workers`
guillaumeeb commented 5 years ago

Several things:

Then you probably miss a dask-kubernetes config file in your image or HOME directory.

Boes-man commented 5 years ago

Thanks everyone, The edits to my jupyter_config.yaml i.e. remove hub.image part and cmd: ['start-singleuser.sh'] have allowed me to upgrade via 4_upgrade_helm.sh The other topics i.e. examples and initContainers I'll continue in new requests.