hamburml / docker-flow-letsencrypt

Companion service which adds Let’s Encrypt certificates to docker flow
MIT License
92 stars 27 forks source link

Full stack example not working #41

Closed nicoecheza closed 6 years ago

nicoecheza commented 6 years ago

Hey, we have been trying to make this work but the example is not working properly.

It seems the Let's Encrypt container is not properly serving the ./well-known/acme-challenge dir.

Here is the container's output:

Starting Docker Flow: Let's Encrypt
Docker Flow: Let's Encrypt started
We will use email@hi.com for certificate registration with certbot. This e-mail is used by Let's Encrypt when you lose the account and want to get it back.
Staging environment of Let's Encrypt is activated! The generated certificates won't be trusted. But you will not reach Let’s Encrypt's rate limits.
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for domain.hi
Waiting for verification...
Cleaning up challenges

Failed authorization procedure. domain.hi (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://domain.hi/.well-known/acme-challenge/MOJDB6_a_sXYiYltms79f8Cd2mlaTU30nh-O449cOJs: Connection refused
Unable to verify domain ownership, we try again in 5 seconds.
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for domain.hi
Waiting for verification...
Cleaning up challenges

Hello! renewAndSendToProxy runs. Today is Tue May  8 18:57:24 UTC 2018
/root/renewAndSendToProxy.sh: line 22: cd: /etc/letsencrypt/live/*/: No such file or directory
old certificates for  will be send to proxy
cat: cert.pem: No such file or directory
cat: chain.pem: No such file or directory
cat: privkey.pem: No such file or directory
HTTP/1.1 400 Bad Request
Date: Tue, 08 May 2018 18:57:24 GMT
Content-Length: 125
Content-Type: text/plain; charset=utf-8

{"Status":"NOK","Message":"Could not send distribute request to the following addresses: [ip1 ip2]","Certs":null}old certificates: proxy received .combined.pem
hamburml commented 6 years ago

Hi @nicoecheza,

sorry to hear that :/

What version of Docker Flow: Proxy and Docker Flow: Swarm Listener do you use? Did you create the /etc/letsencrypt folder before you started the service? Are your DNS-Settings correctly set (for example, can you ping your server with the domain you used?). Do you use ipv6 or ipv4?

nicoecheza commented 6 years ago

Hi,

What version of Docker Flow: Proxy and Docker Flow: Swarm Listener do you use? Latest for both

Did you create the /etc/letsencrypt folder before you started the service? Yes

Are your DNS-Settings correctly set (for example, can you ping your server with the domain you used?) Yes

Do you use ipv6 or ipv4? ipv4, not sure about this one

nicoecheza commented 6 years ago

Also, we tried to create a .well-kwown/acme-challenge/test.txt directory inside the Let's Encrypt container to see if we can reach it (created it in several folders), but we failed. We are not sure where you are storing it since the container has no --webroot option and it also cleans everything up

hamburml commented 6 years ago

Thanks for your answer.

Currently, you need to use ipv4. Docker had some problems with ipv6 if I remember correctly.

What does your browser show if you try to reach your domain (via http)?

You won't find a folder where you can create a .well-kwown/acme-challenge/test.txt file because there is no webserver installed. certbot-auto uses the standalone setting (https://github.com/hamburml/docker-flow-letsencrypt/blob/v0.1.5/certbot.sh#L18), which starts it's own webserver only for the verification-process (https://certbot.eff.org/docs/using.html#standalone).

The container is smaller in size this way. I just tried it with the latest DFP and DFSL versions and it worked (https://michael-hamburger.de/).

Can you please check your dns-settings again? If I remember correctly I once had the problem that I set AAAA records instead of A records.

nicoecheza commented 6 years ago

We actually have a cname: new-swarm-testing.mural.co => nicoswarm.eastus.cloudapp.azure.com

Do you think that can be a problem?

hamburml commented 6 years ago

CNAME should work but you need a A record for the CNAME target (which should be the cloudapp.azure.com url). See https://community.letsencrypt.org/t/having-issues-getting-certificates-with-cname-records/31481

I found https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-custom-domain-name-portal#add-an-a-record-for-your-custom-domain which should help you adding a A record for your azure-app.

I hope this helps.

PS: If I were you I would remove the CNAME and only use an A record for the beginning. If this works we could try adding the CNAME.

nicoecheza commented 6 years ago

Thanks for your replies 😍

Changed it and it is now using an A record, but still no luck 😢, same error.

hamburml commented 6 years ago

Wait some minutes (or hours?) - DNS changes needs some time. DNS-Servers love caching :)

edit

BTW, it looks like Azure changes the public ips from time to time... https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-reserved-public-ip So the A record could point to an ip-address which is (in some days?) not your server anymore.

nicoecheza commented 6 years ago

Ok nice. Will reach you again in a few hours.

gvilarino commented 6 years ago

Hi @hamburml ; I'm helping @nicoecheza in this one. Still not working here, either with an A record or a CNAME.

If we look at the proxy logs in debug mode, the verification request is being forwarded to the letsencrypt service (i.e.: the request matches the ACL) but it fails with the error provided by OP.

I have a question: in all your examples you always start the DOMAIN_* env var with an apex domain, followed by subdomains. It's not our case, and was wondering if this wouldn't work for a single, complete domain (e.g.: DOMAIN_1=('my.domain.com'))

Also, our stack configuration is almost exactly this one (just changing domain names):

version: "3"

services:

  proxy:
    image: dockerflow/docker-flow-proxy
    deploy:
      replicas: 2
      placement:
        constraints: [node.hostname == manager-hostname]
    environment:
      - DEBUG=true
      - LISTENER_ADDRESS=swarm-listener
      - MODE=swarm
    networks:
      - proxy
    ports:
      - 80:80
      - 443:443

  swarm-listener:
    image: dockerflow/docker-flow-swarm-listener
    deploy:
      placement:
        constraints: [node.role == manager]
    environment:
      - DF_NOTIFY_CREATE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/reconfigure
      - DF_NOTIFY_REMOVE_SERVICE_URL=http://proxy:8080/v1/docker-flow-proxy/remove
    networks:
      - proxy
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

  letsencrypt-companion:
    image: hamburml/docker-flow-letsencrypt:latest
    networks:
      - proxy
    environment:
      - DOMAIN_1=( 'new-swarm-testing.domain.com' )
      - CERTBOT_EMAIL=user@domain.com
      - PROXY_ADDRESS=proxy
      - CERTBOT_CRON_RENEW=('0 3 * * *' '0 15 * * *')
      - CERTBOTMODE=staging
    deploy:
      labels:
        - com.df.servicePath=/.well-known/acme-challenge
        - com.df.notify=true
        - com.df.port=80
      replicas: 1
      placement:
        constraints: [node.hostname == manager-hostname]

networks:
  proxy:
    external: true

Note that the proxy network is an overlay network

However, if I replace the letsencrypt-companion image with an image that answers 200 when getting a request under /.well-known/acme-challenge we actually do get a response, so it's not the proxy or the listener not working properly.

If we run the above stackfile with docker stack deploy, we consistently get:

Plugins selected: Authenticator standalone, Installer None
Obtaining a new certificate
Performing the following challenges:
http-01 challenge for new-swarm-testing.domain.com
Waiting for verification...
Cleaning up challenges
Failed authorization procedure. new-swarm-testing.domain.com (http-01): urn:ietf:params:acme:error:connection :: The server could not connect to the client to verify the domain :: Fetching http://new-swarm-testing.domain.com/.well-known/acme-challenge/aHKxoYOzbJdCZKkZ2jYQFCAXTJEzK18xn3qOxVbf7cY: Connection refused

Please feel free try this one by replacing domain.com with michael-hamburger.de; should work the same.

hamburml commented 6 years ago

Hello @gvilarino,

If we look at the proxy logs in debug mode, the verification request is being forwarded to the letsencrypt service (i.e.: the request matches the ACL) but it fails with the error provided by OP.

Good to know!

I have a question: in all your examples you always start the DOMAIN_* env var with an apex domain, followed by subdomains. It's not our case, and was wondering if this wouldn't work for a single, complete domain (e.g.: DOMAIN_1=('my.domain.com'))

I only did this in my examples so that the folder structure in /etc/letsencrypt is regular. The name of the first domain in the domain-list is used as an folder-name.

This evening I will setup a structure which should be similar like yours. Hopefully I will have the same error and can fix this.

nicoecheza commented 6 years ago

Quick update:

Managed to make certbot-auto to work by not using this project's .sh files. Steps below.

  1. Used this Dockerfile (the same this project uses, but without all the files copying, etc):
FROM ubuntu:16.04

#set default env variables
ENV DEBIAN_FRONTEND=noninteractive \
    CERTBOT_EMAIL="" \
    PROXY_ADDRESS="proxy" \
    CERTBOT_CRON_RENEW="('0 3 * * *' '0 15 * * *')" \
    PATH="$PATH:/root"

# http://stackoverflow.com/questions/33548530/envsubst-command-getting-stuck-in-a-container
RUN apt-get update && \
    apt-get -y install cron supervisor curl && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# install certbot-auto
RUN curl -o /root/certbot-auto https://dl.eff.org/certbot-auto && \
    chmod a+x /root/certbot-auto && \
    /root/certbot-auto --version --non-interactive && \
    apt-get purge -y --auto-remove gcc libc6-dev && \
    apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

ENTRYPOINT ["tail", "-f", "/dev/null"]

EXPOSE 80
  1. Created a service the same way you specify on the README
  2. Bashed inside the container (docker exec -it ID_CONTAINER bash)
  3. Pasted the same command you are running on certbot.sh:
certbot-auto certonly --staging --standalone -d some-domain.de --no-self-upgrade --no-bootstrap --standalone --non-interactive --expand --keep-until-expiring --email email@email.com --agree-tos --preferred-challenges http-01 --rsa-key-size 4096 --redirect --hsts --staple-ocsp

And it worked properly 🤔.

So what I'm thinking it's that there is some problem with the cerbot.sh script that's causing the error, since bypassing the script and running the standalone certbot by hand worked like a charm. The certificates were successfully generated.

hamburml commented 6 years ago

Hi there,

I wasn't able to use it only with a subdomain. Currently I am checking the bash-script.

edit

You mention that you were able to use

certbot-auto certonly --staging --standalone -d some-domain.de --no-self-upgrade --no-bootstrap --standalone --non-interactive --expand --keep-until-expiring --email nicolas@mural.co --agree-tos --preferred-challenges http-01 --rsa-key-size 4096 --redirect --hsts --staple-ocsp

The domain you used here was some-domain.de. Did you used a subdomain here also? Like sub.some-domain.de?

The reason why I ask: I can't get certbot to work when I am only using a subdomain.

certbot-auto certonly --dry-run -d subdomain.michael-hamburger.de --no-self-upgrade --no-bootstrap --standalone --non-interactive --expand --keep-until-expiring --email real.hamburml@gmail.com --agree-tos --preferred-challenges http-01 --rsa-key-size 4096 --redirect --hsts --staple-ocsp is the command I am currently using and I always get an ReadTimeout: HTTPSConnectionPool(host='acme-staging-v02.api.letsencrypt.org', port=443): Read timed out. (read timeout=45) error.

nicoecheza commented 6 years ago

Yeah, I'm using new-swarm-testing.domain.com, and worked fine

hamburml commented 6 years ago

And you removed all A records and only have one for this subdomain?

nicoecheza commented 6 years ago

Yes sir

nicoecheza commented 6 years ago

The domain we have has an A record for the apex domain (ie: domain.com) and a lot of entries/subdomains. However this subdomain has only a CNAME (changed it again yesterday) entry pointing to the public DNS endpoint of our infra.

hamburml commented 6 years ago

Is the A record to the apex domain pointing to the ip of your azure container? Just trying to get it working on my end with only one subdomain.

hamburml commented 6 years ago

We talked in slack about this issue. Looks like the --dry-run option was the issue (https://github.com/hamburml/docker-flow-letsencrypt/blob/testing/certbot.sh#L53). Additionally staging is currently down. I removed the option in the testing-branch and will rework the whole companion in the next weeks.

Sorry for the inconvenience and thanks for using it! 👍