keel-hq / keel

Kubernetes Operator to automate Helm, DaemonSet, StatefulSet & Deployment updates
https://keel.sh
Mozilla Public License 2.0
2.42k stars 280 forks source link

Failed to add image watch job: i/o timeout #335

Open lentzi90 opened 5 years ago

lentzi90 commented 5 years ago

I'm trying to deploy keel using helm and having it update itself. Unfortunately keel fails to add the image watch job due to i/o timeout. I'm following the documentation here and have the following values file:

helmProvider:
  enabled: true

keel:
  # keel policy (all/major/minor/patch/force)
  policy: all
  # trigger type, defaults to events such as pubsub, webhooks
  trigger: poll
  # polling schedule
  pollSchedule: "@every 5m"
  # images to track and update
  images:
    - repository: image.repository
      tag: image.tag

Is there a way to increase this timeout? Anything else I'm doing wrong?

Here is a sample log entry:

time="2019-01-19T14:10:21Z" level=error msg="trigger.poll.RepositoryWatcher.Watch: failed to add image watch job" error="Get https://index.docker.io/v2/keelhq/keel/manifests/0.12.0: Get https://auth.docker.io/token?scope=repository%3Akeelhq%2Fkeel%3Apull&service=registry.docker.io: dial tcp: i/o timeout" image="namespace:kube-system,image:index.docker.io/keelhq/keel,provider:helm,trigger:poll,sched:@every 5m,secrets:[]"
time="2019-01-19T14:10:21Z" level=error msg="trigger.poll.manager: got error(-s) while watching images" error="encountered errors while adding images: Get https://index.docker.io/v2/keelhq/keel/manifests/0.12.0: Get https://auth.docker.io/token?scope=repository%3Akeelhq%2Fkeel%3Apull&service=registry.docker.io: dial tcp: i/o timeout"
rusenask commented 5 years ago

Hi, seems like it can't reach https://auth.docker.io, maybe there's a DNS issue inside the pod/cluster?

lentzi90 commented 5 years ago

I tried nslookup from the pod and the output is a bit weird:

# nslookup auth.docker.io
nslookup: can't resolve '(null)': Name does not resolve

Name:      auth.docker.io
Address 1: 34.233.151.211 ec2-34-233-151-211.compute-1.amazonaws.com
Address 2: 52.54.155.177 ec2-52-54-155-177.compute-1.amazonaws.com
Address 3: 54.165.149.19 ec2-54-165-149-19.compute-1.amazonaws.com
Address 4: 52.206.40.44 ec2-52-206-40-44.compute-1.amazonaws.com
Address 5: 34.206.236.31 ec2-34-206-236-31.compute-1.amazonaws.com
Address 6: 52.22.67.152 ec2-52-22-67-152.compute-1.amazonaws.com
Address 7: 52.22.201.61 ec2-52-22-201-61.compute-1.amazonaws.com
Address 8: 52.70.175.131 ec2-52-70-175-131.compute-1.amazonaws.com

I'm unsure how to debug further. Any ideas?

rusenask commented 5 years ago

no idea, last week I had some problems with docker registry too, wasn't pulling images but it was on GCP, not AWS. Maybe just remove from keel self update tag as the API has been stable for a year and even an old version would probably still do the job (with an exception of AWS credentials helper in your AWS case I guess) :)

Btw, since for me it was a non critical service I just left it in a crashloop and eventually it resolved the image.