Closed tonnenpinguin closed 2 years ago
@tonnenpinguin We've been using OTP 25 quite a while and haven't seen this issue. Since the error is "SERVER ALERT: Fatal - Certificate required", I'd verify that the device's certificate is being passed. However, I don't know why it would change between OTP versions. I think that I'd check if any dependencies need to be updated. Maybe one's out of date and it's important and the slipstream or mint or nerves_hub_link isn't tight enough on its dependency settings to force it?
@jjcarstens Any ideas?
I spoke offline with @jjcarstens. I understand that there's been quite a bit of debug and there's some SSL/TLS incompatibility between OTP 23 and OTP 24/25 between devices and servers that is unknown.
Yeah, everything works great as long as the server is running OTP >=23, which I think is the case with your installation.
I created a bare minimum project that triggers the issue. All you need to do is follow the setup as documented in poser and copy over the ssl & nerves-hub directories.
Seems like it has nothing to do with nerves at all and the bug is either in mint or in otp/ssl.
FWIW Here's two traces with different client/server version pairs
This works otp25client_otp23server.txt This fails otp25client_otp24server.txt
Based on the trace, it looks like the server is passing the certificate_authorities
extension with OTP 24 AND the client pays attention to it. The server shouldn't do this. The Erlang SSL docs make it sound like it defaults to off. However, this code really looks like it's always sent with TLS1.3, but I didn't trace to see if the certificate_request/4
function is the right one to look at.
I have a recon trace as well as some Wireshark captures (yes, the trace has lots of cert/key data but its only used locally so #yolo) poser.log wireshark_captures.zip
@fhunleth yeah I think that's pretty much it. To double check I ran NHW with OTP 24.1 (<24.3 when they introduced the certificate_authorities
field) and everything works as expected.
From the commit message that introduced the code you posted:
In TLS-1.3 consider certificate_authorities extension if present in client hello or certificate request. Add the extension in certificate requests, and possibility to have it in client hello.
Looking through the changes it seems like that the option is indeed respected on the initial ClientHello sent by the client, but the server always responds with the list as you pointed out.
I think the problem might be an overly strict interpretation of the RFC in this piece of code
In section 4.3.2 to the RFC states
The "certificate_authorities" extension is used to indicate the certificate authorities (CAs) which an endpoint supports and which SHOULD be used by the receiving endpoint to guide certificate selection.
The way things are implemented the certificate_authorities
field not only guides the cert selection, but filters out all certs, even if we might have one available.
Should we create a ticket in https://github.com/erlang/otp for this?
I could do it but to be honest I'm not fully clear on why the client certs aren't part of the certificate chain yet.
Shouldn't the Device signer cert be an intermediate CA for the Device Root CA?
In my understanding that would mean that the ssl_certificate:certificate_chain
check should succeed for the device cert, no?
The device certificate chain is only used to automatically register devices with NervesHub. In the multi-homed use case, the CA chain depends on whatever certs the user uploads to NH. NervesHub pins device certificates once it knows about the device and auths against that. The CA chain certs can expire, be deleted, or withdrawn. It doesn't matter since after first registration they're unused. In fact, if you upload device certs through the UI or API, NH doesn't need to check the chain since you've authenticated with NH and are telling NH to trust the device cert that you gave it.
The main issue is on the server side for including the certificates. I see that there's a "SHOULD" vs. "MUST" interpretation on how the certificate_authorities
extension is processed on the device side. I don't feel like I know the ramifications elsewhere on ignoring the extension. It seems wrong to ignore since the server is trying to tell you what it accepts to save the client time in supplying certs that won't work. In our case the server is supplying the wrong certificates in the certificate_authorities
extension and it can't supply acceptable CA certs since that list might be very long. I think that it's better to drop the extension.
What I'd do is comment out the line in OTP that includes the certificate_authorities
extension in TLS 1.3 to confirm that that fixes the issue. Then if it does, create a ticket on OTP that the extension should be optionally included like is done for TLS 1.2. @jjcarstens was going to try commenting out the line, but I don't know if he has gotten to it yet.
I've confirmed @fhunleth theory - Removing the certificate_authories
from the extensions fixes the issue and allows the device to connect.
diff --git a/lib/ssl/src/tls_handshake_1_3.erl b/lib/ssl/src/tls_handshake_1_3.erl
index 87855f2989..8204604a88 100644
--- a/lib/ssl/src/tls_handshake_1_3.erl
+++ b/lib/ssl/src/tls_handshake_1_3.erl
@@ -213,10 +213,10 @@ certificate_request(SignAlgs0, SignAlgsCert0, CertDbHandle, CertDbRef) ->
SignAlgsCert = filter_tls13_algs(SignAlgsCert0),
Extensions0 = add_signature_algorithms(#{}, SignAlgs),
Extensions = add_signature_algorithms_cert(Extensions0, SignAlgsCert),
- Auths = ssl_handshake:certificate_authorities(CertDbHandle, CertDbRef),
+ _Auths = ssl_handshake:certificate_authorities(CertDbHandle, CertDbRef),
#certificate_request_1_3{
certificate_request_context = <<>>,
- extensions = Extensions#{certificate_authorities => #certificate_authorities{authorities = Auths}}}.
+ extensions = Extensions#{}}.
@tonnenpinguin @fhunleth @jjcarstens
Currently I try to update self-hosted NervesHub for dev to follow upstream v1.0.2, and I also faced this issue too. This thread is very helpful me, thank you.
I confirmed that NervesHub v1.0.2 with below workaround setting could handshake with client, rpi3 1.17.3(OTP 24.1.4), 1.19.0(OTP 25.0) and 1.20.0(OTP 25.0.2).
# versions on Dockerfile header
ARG ELIXIR_VERSION=1.13.3
ARG ERLANG_VERSION=24.2.2 # this works fine.
ARG ALPINE_VERSION=3.15.0
ARG NODE_VERSION=16.14.2
I don't fully understand the issue details, but I hope this information will give you some ideas for workarounds.
Until the issues in OTP are resolved, the current workaround is to use a lower supported OTP (< 24.3) or force TLS 1.2 on your device with versions: [:"tlsv1.2"]
in your :ssl
options
Fixes are in place with https://github.com/erlang/otp/pull/6228 - Once that is merged into OTP ~25.1~ 25.2, we can alter the NervesHubWeb configuration to use certificate_authorities: false
.
Closing this issue for now since it is not a rpi4 specific case and we'll adjust in NervesHub when available.
Thanks Jon! One minor clarification for future readers: The flag is set to be added in OTP 25.2 :)
🤦 derp - yes. I updated the comment
Environment
elixir -v
):Elixir 1.13.4 (compiled with Erlang/OTP 25)
|nerves_bootstrap| Environment Package List
Pkg: nerves_toolchain_ctng Vsn: 1.8.5 Type: toolchain_platform BuildRunner: {nil, []}
Pkg: nerves_system_br Vsn: 1.19.0 Type: system_platform BuildRunner: {nil, []}
Pkg: nerves_system_rpi4 Vsn: 1.19.0 Type: system BuildRunner: {Nerves.Artifact.BuildRunners.Docker, [make_args: ["source", "all", "legal-info"]]}
Pkg: nerves_toolchain_aarch64_nerves_linux_gnu Vsn: 1.5.0 Type: toolchain BuildRunner: {Nerves.Artifact.BuildRunners.Local, []}
|nerves_bootstrap| Loadpaths Start
Nerves environment MIX_TARGET: ft_access MIX_ENV: prod
|nerves_bootstrap| Environment Variable List target: ft_access toolchain: /Users/tonnenpinguin/.nerves/artifacts/nerves_toolchain_aarch64_nerves_linux_gnu-darwin_x86_64-1.5.0 system: /Users/tonnenpinguin/.nerves/artifacts/nerves_system_rpi4-portable-1.19.0 app: .
|nerves_bootstrap| Loadpaths End
12:27:21.835 [error] mfa=:gen_server.error_info/8 line=1399 GenServer #PID<0.2705.0> terminating ** (CaseClauseError) no case clause matching: {:error, %Mint.HTTP1{buffer: "", host: "device.nerves-hub.[...]", mode: :active, port: 443, private: %{extensions: [], scheme: :wss, sec_websocket_key: "SEC_WEBSOCKET_KEY"}, proxy_headers: [], request: %{body: nil, connection: [], content_length: nil, data_buffer: [], headers_buffer: [], method: "GET", ref: #Reference<0.578604679.268435457.124554>, state: :status, status: nil, transfer_encoding: [], version: nil}, requests: {[], []}, scheme_as_string: "https", socket: {:sslsocket, {:gen_tcp, #Port<0.69>, :tls_connection, :undefined}, [#PID<0.2708.0>, #PID<0.2707.0>]}, state: :closed, streaming_request: nil, transport: Mint.Core.Transport.SSL}, %Mint.TransportError{reason: {:tls_alert, {:certificate_required, 'TLS client: In state connection received SERVER ALERT: Fatal - Certificate required\n'}}}, []} (slipstream 1.0.0) lib/slipstream/connection/pipeline.ex:140: Slipstream.Connection.Pipeline.decode_message/1 (slipstream 1.0.0) lib/slipstream/connection/pipeline.ex:34: anonymous fn/1 in Slipstream.Connection.Pipeline.handle/2 (slipstream 1.0.0) lib/slipstream/connection/telemetry.ex:29: anonymous fn/2 in Slipstream.Connection.Telemetry.span/2 (telemetry 1.1.0) /Users/tonnenpinguin/src/cardreader/deps/telemetry/src/telemetry.erl:320: :telemetry.span/3 (slipstream 1.0.0) lib/slipstream/connection/telemetry.ex:25: Slipstream.Connection.Telemetry.span/2 (stdlib 4.0) gen_server.erl:1120: :gen_server.try_dispatch/4 (stdlib 4.0) gen_server.erl:1197: :gen_server.handle_msg/6 (stdlib 4.0) proc_lib.erl:240: :proc_lib.init_p_do_apply/3 Last message: {:ssl_error, {:sslsocket, {:gen_tcp, #Port<0.69>, :tls_connection, :undefined}, [#PID<0.2708.0>, #PID<0.2707.0>]}, {:tls_alert, {:certificate_required, 'TLS client: In state connection received SERVER ALERT: Fatal - Certificate required\n'}}}