scalableminds / webknossos

Visualize, share and annotate your large 3D images online
https://webknossos.org
GNU Affero General Public License v3.0
128 stars 24 forks source link

WebKNOSSOS nginx-letsencrypt container fails to recognize valid SSL certs with custom PUBLIC_HOST url #7871

Closed aaronkanzer closed 5 months ago

aaronkanzer commented 5 months ago

Context

While going through https://docs.webknossos.org/webknossos/installation.html, nginx-letsencrypt routinely fails to recognize valid SSL certs via just the population of the environment variable of: PUBLIC_HOST=<the-a-name-record....org>

This "bug" ticket is to inquire what additional steps are being done/if there is anything hard-coded that should be abstracted so that the installation docs work as intended.

This bug was investigated via logs below. It is also worth noting that the persistent directory (mounted into the webknossos API container) was empty when it came to loading in valid .crt and .key files required for proper SSL cert.

Cc @kabilar @satra

Expected Behavior

nginx-letsencrypt should recognize the SSL cert associated with the DNS record A name on the instance.

Current Behavior

nginx-letsencrypt fails to recognize/retrieve valid SSL cert associated with the DNS record A name on the instance.

I have verified, from the cloud infrastructure side, that our setup is appropriate -- see screenshots below for EC2, Route 53, ACM and the associated security group

These are all linked to the URL here: https://webknossos.lincbrain.org/

Screenshot 2024-06-08 at 9 28 35 AM Screenshot 2024-06-08 at 9 29 37 AM Screenshot 2024-06-08 at 9 30 08 AM Screenshot 2024-06-08 at 9 33 02 AM

Steps to Reproduce the bug

  1. Create a Linux-based AWS EC2 instance with at least 16GB of RAM available, ensure the instance is configured to be reachable by HTTP/HTTPS
  2. Reference the Public IP as an A name DNS Record in AWS Route 53
  3. Ensure that the A name DNS Record is covered by a validated, issued public certificate in AWS ACM
  4. SSH onto the EC2 instance
  5. Install docker, docker-compose on the instance (sudo yum install docker -y ....)
  6. Follow steps at https://docs.webknossos.org/webknossos/installation.html, populating the environment variables PUBLIC_HOST=<the-a-name-record....org>
  7. nginx-letscrypt will try to ping http://webknossos.lincbrain.org/.well-known/acme-challenge and subsequently fails with the following:
[Sat Jun  8 04:44:07 UTC 2024] Pending, The CA is processing your order, please just wait. (1/30)
[Sat Jun  8 04:44:10 UTC 2024] Invalid status, webknossos.lincbrain.org:Verify error detail:3.20.222.53: Invalid response from http://webknossos.lincbrain.org/.well-known/acme-challenge/Z6qNPZ6WwCF-acyXP2Z000ctimci2DT_KauAuNZJcq8: 
[Sat Jun  8 04:44:10 UTC 2024] Please check log file for more details: /dev/null
Sleep for 3600s
Creating/renewal webknossos.lincbrain.org certificates... (webknossos.lincbrain.org)
[Sat Jun  8 05:44:10 UTC 2024] Using CA: https://acme-v02.api.letsencrypt.org/directory
[Sat Jun  8 05:44:10 UTC 2024] Using pre generated key: /etc/acme.sh/admin@lincbrain.org/webknossos.lincbrain.org/webknossos.lincbrain.org.key.next
[Sat Jun  8 05:44:10 UTC 2024] Generate next pre-generate key.
[Sat Jun  8 05:44:11 UTC 2024] Single domain='webknossos.lincbrain.org'
[Sat Jun  8 05:44:11 UTC 2024] Getting domain auth token for each domain
[Sat Jun  8 05:44:12 UTC 2024] Getting webroot for domain='webknossos.lincbrain.org'
[Sat Jun  8 05:44:12 UTC 2024] Verifying: webknossos.lincbrain.org
[Sat Jun  8 05:44:13 UTC 2024] Pending, The CA is processing your order, please just wait. (1/30)
[Sat Jun  8 05:44:15 UTC 2024] Invalid status, webknossos.lincbrain.org:Verify error detail:3.20.222.53: Invalid response from http://webknossos.lincbrain.org/.well-known/acme-challenge/icgV7X1rcqx0H4ZGsoYVdIjWljr33qx2AH2f6owIC5s: 
[Sat Jun  8 05:44:15 UTC 2024] Please check log file for more details: /dev/null
Sleep for 3600s

Your Environment for bug

aaronkanzer commented 5 months ago
Screenshot 2024-06-08 at 9 45 16 AM

If this is helpful as well here is more context for our contents of the EC2 -- my hunch here is that some step is not working appropriately where the /home/ec2-user/opt/webknossos/persistent/nginx/certs/webknossos.lincbrain.org directory should be populated, but is not.

normanrz commented 5 months ago

Hi, nginx-letsencrypt is attempting to generate its own certificates with letsencrypt. It does not use the ACM certificates. I don't know if the ACM certificates interfere with the letsencrypt validation flow. It seems to me this is not directly a Webknossos issue. For more information about nginx-letsencrypt please refer to https://github.com/nginx-proxy/acme-companion. Alternatively, you can also set up any other reverse proxy with SSL termination, such as Caddy, traefik or ELB.

aaronkanzer commented 5 months ago

@normanrz Thanks for the response.

Unfortunately, I'm not sure I follow, as there is nothing specific to my setup that points to ACM (rather, I was just showing with the screenshot above that the DNS record we pointed already had valid SSL associated).

Is there anything else you could share for what WebKNOSSOS custom-codes for the nginx-related containers vs. what is used by default via nginx? I've been inspecting the scalableminds/nginx-proxy container to see if there is something specific here....

normanrz commented 5 months ago

I don't think there is anything Webknossos-specific here. Our nginx-proxy image just adds a few config options. Webknossos itself doesn't deal with SSL or domains. It just needs to know the domain names.

aaronkanzer commented 5 months ago

@normanrz Thanks for this -- I wasn't able to resolve the SSL cert issue unfortunately with the docker-compose steps defined in the installation.

For a work-around (to no longer be blocked), I ended up using certbot within the EC2 instance via Docker:

Firstly, removing reference to nginx-letsencrypt in the docker-compose.yml file, and updating the nginx step as such:

nginx-proxy:
    image: nginx:latest
    container_name: nginx-proxy
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./certs:/etc/letsencrypt
    depends_on:
      - webknossos

then:

sudo docker run --rm -p 80:80 -v $(pwd)/certs:/etc/letsencrypt -v $(pwd)/certs-data:/data/letsencrypt certbot/certbot certonly --standalone -d <insert-dns-record.org> --email <insert-email-address> --agree-tos --non-interactive

then an nginx.conf as such mounted in the same directory (opt/webknossos)

events {}

http {
    server {
        listen 80;
        server_name webknossos.lincbrain.org;

        location /.well-known/acme-challenge/ {
            root /data/letsencrypt;
        }

        location / {
            return 301 https://$host$request_uri;
        }
    }

    server {
        listen 443 ssl;
        server_name webknossos.lincbrain.org;

        ssl_certificate /etc/letsencrypt/live/webknossos.lincbrain.org/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/webknossos.lincbrain.org/privkey.pem;

        location / {
            proxy_pass http://webknossos-webknossos-1:9000;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
          }
    }
}

I'm not 100% sure why the failures occurred here; however, if anyone hits a similar error for any reason, this workaround was fairly effortless

thanks again for all the help thus far (especially these past couple days with the /bin/webknossos issue) -- much appreciated -- closing this issue for now since we were able to resolve