janeczku / rancher-letsencrypt

:cow: Rancher service that obtains and manages free SSL certificates from the Let's Encrypt CA
Apache License 2.0
326 stars 114 forks source link

i/o timeout #73

Open fjoesne opened 7 years ago

fjoesne commented 7 years ago

from catalog 0.4.0, using route53.

rancher: 1.5.6 docker: 1.12.3

level=error msg="[sub.domain.com] Error obtaining certificate: Time limit exceeded. Last error: read udp 10.42.11.170:47429->205.251.198.78:53: i/o timeout"

I get a similar issue with other domains.

similar closed issues claims that this should be fixed with 0.4.0. #38

version: '2'
services:
  letsencrypt:
    image: janeczku/rancher-letsencrypt:v0.4.0
    environment:
      API_VERSION: Production
      AWS_ACCESS_KEY: XXXXXX
      AWS_SECRET_KEY: XXXXXX
      CERT_NAME: xxxx
      DOMAINS: sub.domain.com
      EMAIL: xxx@xxx.com
      EULA: 'Yes'
      PROVIDER: Route53
      PUBLIC_KEY_TYPE: RSA-4096
      RENEWAL_TIME: '03'
    volumes:
    - /containerstorage/letsencrypt:/etc/letsencrypt
    labels:
      io.rancher.container.agent.role: environment
      io.rancher.container.start_once: 'true'
      io.rancher.container.create_agent: 'true'
janeczku commented 7 years ago

@fjoesne Hey there, v0.4.0 used the hosts (/etc/resolv.conf) nameservers to check propagation of the ACME TXT record. It looks like your DNS resolver (205.251.198.78) is not responding to some queries.
v0.5.0 reverts back to use Google's public DNS servers by default. They are much more reliable for the kind of DNS queries the upstream ACME library does.

djk commented 7 years ago

We're using Route 53 validation with version 0.5.0.

We are experiencing this issue despite having set the DNS_RESOLVERS variable and also setting them in Rancher. The app is still attempting to directly resolve against AWS DNS servers (or so it looks from the logs) and this will never work due to our internal corporate network, hence us setting the DNS servers to use.

Happy to help debug this one as it's blocking us.