sasa1977 / site_encrypt

Integrated certification via Let's encrypt for Elixir-powered sites
MIT License
462 stars 33 forks source link

"Cannonical" approach fails both in Prod and Dev #36

Open JoseMPena opened 2 years ago

JoseMPena commented 2 years ago

Following the suggested steps in the README and Hexdocs fails both in development (test) and production using native client.

Development

In Dev, running the example test (as documented) returns "address already in use" although there is no process currently using in port 4002.

In endpoint.ex

...
directory_url:
        case Application.get_env(:client_hub, :cert_mode, "local") do
          "local" -> {:internal, port: 4002}
          "staging" -> "https://acme-staging-v02.api.letsencrypt.org/directory"
          "production" -> "https://acme-v02.api.letsencrypt.org/directory"
        end

In the test file "as is"

defmodule ClientHubWeb.Endpoint.CertificationTest do
  use ExUnit.Case, async: false
  import SiteEncrypt.Phoenix.Test
  test "certification" do
    clean_restart(ClientHubWeb.Endpoint)
    cert = get_cert(ClientHubWeb.Endpoint)
    # assert cert.domains == ~w/mysite.com www.mysite.com/
  end
end

This test fails with:

12:14:03.891 [error] Failed to start Ranch listener SiteEncrypt.Acme.Server_4002 in :ranch_ssl:listen([cacerts: :..., key: :..., cert: :..., alpn_preferred_protocols: ["h2", "http/1.1"], next_protocols_advertised: ["h2", "http/1.1"], reuse_sessions: true, secure_renegotiate: true, port: 4002]) for reason :eaddrinuse (address already in use)

12:14:03.904 [error] Error starting the child {:ranch_listener_sup, SiteEncrypt.Acme.Server_4002}: {:shutdown, {:failed_to_start_child, :ranch_acceptors_sup, {:listen_error, SiteEncrypt.Acme.Server_4002, :eaddrinuse}}}

but lsof says there's nothing currently listening on that port.

In Production

In Prod, both Chrome and Firefox throw NET::ERR_CERT_AUTHORITY_INVALID while visiting the URL

Some stats, according to ssllabs.com:

Key: RSA 1024 bits (e 65537)   WEAK Certificate Transparency: No DNS CAA: No Trusted: No   NOT TRUSTED (Why?)Mozilla  Apple  Android  Java  Windows

JoseMPena commented 2 years ago

It does work in Production after forcing the cert via SiteEncrypt.force_certify(MySystemWeb.Endpoint)

sasa1977 commented 2 years ago

In test error, ranch reports that the address is already in use. If you can't see anything with lsof or e.g. telnet, it could perhaps be some race condition during the endpoint restart (e.g. the previous instance lingers on for awhile). However, I've never seen this happen so far. Is this failing during the test run, or is the app failing to start before the test even executes?

The prod error likely means that a self-signed certificate is used, which means that certification didn't start automatically. Do you see anything of interest in logs? Also, is mode perhaps configured to :manual?

JoseMPena commented 2 years ago

Thanks for replying. The Production "issue" is solved by using the force_certify method as described in the README (at least in my case it's working so far). In Test, however, the culprit seems to be the function clean_restart() (it fails with the test running in isolation and with every other line commented out). Will check the method definition in your code whenever I get some time :)