pangeo-data / pangeo-cloud-federation

Deployment automation for Pangeo JupyterHubs on AWS, Google, and Azure
https://pangeo.io/cloud.html
58 stars 32 forks source link

Deployments failing #500

Closed TomAugspurger closed 4 years ago

TomAugspurger commented 4 years ago

dev, ocean, and hydro are failing on staging. OOI is failing on prod

Staging: https://circleci.com/gh/pangeo-data/pangeo-cloud-federation/1010 Prod: https://circleci.com/gh/pangeo-data/pangeo-cloud-federation/1001

UPGRADE FAILED
Error: Deployment.apps "gateway-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"gateway", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "scheduler-proxy-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"scheduler-proxy", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "web-proxy-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"web-proxy", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
Error: UPGRADE FAILED: Deployment.apps "gateway-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"gateway", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "scheduler-proxy-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"scheduler-proxy", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "web-proxy-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"web-proxy", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
Traceback (most recent call last):
  File "/home/circleci/repo/venv/bin/hubploy", line 11, in <module>
    sys.exit(main())
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/__main__.py", line 89, in main
    helm.deploy(args.deployment, args.chart, args.environment, args.namespace, args.set, args.version, args.timeout, args.force)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 126, in deploy
    force
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 62, in helm_upgrade
    subprocess.check_call(cmd)
  File "/usr/local/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'upgrade', '--wait', '--install', '--namespace', 'dev-staging', 'dev-staging', 'pangeo-deploy', '-f', 'deployments/dev/config/common.yaml', '-f', 'deployments/dev/config/staging.yaml', '-f', 'deployments/dev/secrets/staging.yaml', '--set', 'pangeo.jupyterhub.singleuser.image.tag=2b35660', '--set', 'pangeo.jupyterhub.singleuser.image.name=gcr.io/*************/dev-pangeo-io-notebook']' returned non-zero exit status 1.

Looking into this a bit now.

TomAugspurger commented 4 years ago

The error mesage on the OOI failure is a bit different actually

UPGRADE FAILED
Error: render error in "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml": template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml:23:28: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml" at <include (print .Template.BasePath "/secret.yaml") .>: error calling include: template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml:9:19: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml" at <required "gateway.proxyToken must be a 32 byte random string" .Values.gateway.proxyToken>: error calling required: gateway.proxyToken must be a 32 byte random string
Error: UPGRADE FAILED: render error in "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml": template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml:23:28: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/web-proxy-deployment.yaml" at <include (print .Template.BasePath "/secret.yaml") .>: error calling include: template: pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml:9:19: executing "pangeo-deploy/charts/pangeo/charts/dask-gateway/templates/secret.yaml" at <required "gateway.proxyToken must be a 32 byte random string" .Values.gateway.proxyToken>: error calling required: gateway.proxyToken must be a 32 byte random string
Traceback (most recent call last):
  File "/home/circleci/repo/venv/bin/hubploy", line 11, in <module>
    sys.exit(main())
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/__main__.py", line 89, in main
    helm.deploy(args.deployment, args.chart, args.environment, args.namespace, args.set, args.version, args.timeout, args.force)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 126, in deploy
    force
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 62, in helm_upgrade
    subprocess.check_call(cmd)
  File "/usr/local/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'upgrade', '--wait', '--install', '--namespace', 'ooi-prod', 'ooi-prod', 'pangeo-deploy', '--timeout', '1200', '-f', 'deployments/ooi/config/common.yaml', '-f', 'deployments/ooi/config/prod.yaml', '-f', 'deployments/ooi/secrets/prod.yaml', '--set', 'pangeo.jupyterhub.singleuser.image.tag=c987384', '--set', 'pangeo.jupyterhub.singleuser.image.name=ooicloud.azurecr.io/ooi-pangeo-io-notebook']' returned non-zero exit status 1.
{"errors":[{"message":"Permission denied, wrong credentials","field":null,"help":null}]}
Exited with code exit status 1

cc @tjcrone if you perhaps already know what's failing there.

TomAugspurger commented 4 years ago

This

Error: Deployment.apps "gateway-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"gateway", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "scheduler-proxy-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"scheduler-proxy", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable && Deployment.apps "web-proxy-dev-staging-dask-gateway" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"web-proxy", "app.kubernetes.io/instance":"dev-staging", "app.kubernetes.io/managed-by":"Tiller", "app.kubernetes.io/name":"dask-gateway"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

looks similar to https://github.com/dask/dask-gateway/issues/147. @jcrist do you know, should that have been fixed? IIUC, we're already on 0.6.1 for staging. Perhaps is a different issue.

jcrist commented 4 years ago

So you're already running 0.6.1, and you're updating values, so this isn't an upgrade of dask-gateway? If so, then this is a different issue and is odd. Can you do an upgrade with --dry-run --debug added so that the rendered charts are output and post them somewhere? None of the labels we use for matchLabels should change during an upgrade, I'm curious what's happening here.

TomAugspurger commented 4 years ago

Looking at the config, it seems we were still on 0.5 scheduler-proxy: daskgateway/dask-gateway-server:0.5.0. Sorry about the confusion.

In that case, I'll manually remove the bad deployment and trigger a new one.

TomAugspurger commented 4 years ago

I've manually deleted the staging dask-gateway deployments through the google cloud platform UI. https://github.com/pangeo-data/pangeo-cloud-federation/pull/501 should redeploy things when its merged.

TomAugspurger commented 4 years ago

I think we're all good here.

jcrist commented 4 years ago

Thanks Tom. Does this mean #479 can also be closed?