smartrent / jackalope

An opinionated MQTT client library based on Tortoise MQTT
Apache License 2.0
28 stars 5 forks source link

AWS IoT example #28

Open fhunleth opened 3 years ago

fhunleth commented 3 years ago

At some point, the AWS IoT SSL setup was dropped from this repository. Can this be added back? Perhaps as an example? The reason is that the SSL setup is quite complicated for anyone getting started.

fhunleth commented 3 years ago

Copying this comment from @jjcarstens over from https://github.com/nerves-project/nerves/issues/566:

FWIW, first time communication with AWS MQTT typically requires that your signer CA cert be in with the cacerts list as well.

So for your options, you'd need something like:

... cacerts: [your_signer_ca_der | :certifi.cacerts()], certfile: "/srv/erlang/lib/network_led-0.1.0/priv/testdev01.cert", keyfile: "/srv/erlang/lib/network_led-0.1.0/priv/testdev01.private.key", ...

where your_signer_ca_der is the CA certificate used to create testdev01.cert, read in and converted to DER format.

This is typically just needed for initial connect. If that doesn't fix things, there is probably something with the AWS setup that needs to be handled there

bartimaeus commented 3 years ago

@fhunleth I am generated certificates using the One-click certificate creation. I tried using the Amazon Root CA1 certificate like you recommended above without luck.

rootca1_der = "/srv/erlang/lib/network_led-0.1.0/priv/AmazonRootCA1.pem"
    |> File.read!()
    |> X509.Certificate.from_pem!()
    |> X509.Certificate.to_der()

...
cacerts: [rootca1_der | :certifi.cacerts()],
certfile: "/srv/erlang/lib/network_led-0.1.0/priv/testdev01.cert",
keyfile: "/srv/erlang/lib/network_led-0.1.0/priv/testdev01.private.key",
...

I also tried all of the certificates listed here: https://docs.aws.amazon.com/iot/latest/developerguide/server-authentication.html.

Is there a certificate that worked for you?

jjcarstens commented 3 years ago

@bartimaeus I'm not really familiar with the One-click certificate creation - Is the AmazonRootCA1.pem there above the signer CA that was used to generate testdev01.cert ? (it seems like that is a server CA for the connection and not the signer CA?)

According to these instructions for the one-click, you should have been able to download the device public key, private key, and also the root and signer CA certs.

Step 5 also states that you must "activate" the device in AWS before it will work as well:

  1. A client certificate has now been created and registered with AWS IoT. You must activate the certificate before you use it in a client. Choose Activate to activate the client certificate now. If you don't want to activate the certificate now, Activate a client certificate (console) describes how to activate the certificate later.

I think we need to remove AWS configuration issues from this first before aimlessly trying other certs. Also double check the location of your certfile and keyfile paths and ensure they are available at runtime (not just compile time)

bartimaeus commented 3 years ago

@jjcarstens I tested my certificates using the ruby mqtt gem just to be sure my one-click certificates were active and had the appropriate permissions. The following ruby code worked for me:

# using "mqtt" gem

client = MQTT::Client.new
client.host = '[IOT-CLIENT-ID].iot.us-east-1.amazonaws.com'
client.port = 8883
client.ssl = true
client.cert_file = '[absolute_path]/ssl/certificate.pem.crt'
client.key_file  = '[absolute_path]/ssl/private.pem.key'
client.ca_file   = '[absolute_path]/ssl/AmazonRootCA1.pem'
client.connect()

client.subscribe('testtopic')

My elixir code looks like:

Erlang: Erlang/OTP 23 [erts-11.1.3] Elixir: 1.11.2

Code:


rootca1_der = "[absolute_path]/ssl/AmazonRootCA1.pem"
  |> File.read!()
  |> X509.Certificate.from_pem!()
  |> X509.Certificate.to_der()

Tortoise.Supervisor.start_child(
  client_id: "elixir_7b5c965b",
  version: "3.1.1",
  handler: {Tortoise.Handler.Logger, []},
  server: {
    Tortoise.Transport.SSL,
    host: '[IOT-CLIENT-ID].iot.us-east-1.amazonaws.com',
    port: 8883,
    certfile: '[absolute_path]/ssl/certificate.pem.crt',
    keyfile: '[absolute_path]/ssl/private.pem.key',
    cacerts: [rootca1_der | :certifi.cacerts()],
    # depth: 3, # changing the depth makes the error go away, but does not resolve the connection issue
    versions: [:"tlsv1.2"],
    server_name_indication: '*.iot.us-east-1.amazonaws.com',
  },
  subscriptions: [{"testtopic/#", 0}]
)

The error is:

[error] GenServer {Tortoise.Registry, {Tortoise.Connection, "elixir_7b5c965b"}} terminating ** (stop) {:tls_alert, {:handshake_failure, 'TLS client: In state certify at ssl_handshake.erl:1952 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,max_path_length_reached}'}}

If I change the Tortoise config depth to anything greater than 3, then the above error goes away, but the connection still does not appear to be valid.

bartimaeus commented 3 years ago

@jjcarstens the process for creating a one-click certificate is as follows:

Here is a video to showing the process. When I click on the root certificate download link it takes me to a website that lists all of the AWS root certificates.

one-click-certificate

mattludwigs commented 3 years ago

I think I have some more information to provide about this, however, it does not solve the issue but I hope that it is helpful.

I am using the AWS one-click certs and I know that this works because I have been able to connect and send messages to my AWS IoT broker with no problem with/on other platforms.

My code can be found here.

I used Wireshark to see the TLS/SSL traffic. The handshake seems to work perfectly but after the first send, the server seems to close the connection for some unknown reason as the traffic is encrypted. However, I was able to decrypt the traffic in Wireshark to see what was actually being sent to and from the server. I had to modify tortoise to make this work*.

close-notify-bad

After the MQTT connect command is received we get a close notify from the server which makes the connection close.

I haven't had time to analyze why we might be getting a close notify at this point, but it seems like the next data point to explore. What causes a server to send a close notify after a good handshake? I am sure this is easily googleable, but will just take some time parsing the useful information vs non-useful information.

I do have a Nerves device that successfully connects to AWS IoT but it does not use the one-click certs. I know how to capture traffic via Wireshark for that device in the same manner as my local device but haven't had time to set up and grab a trace of the traffic yet. If anyone is interested and has a Nerves device that connects to AWS IoT I can point you in the right direction, otherwise, I will explore that route once I have some time.

Another thing that might be worth exploring is using another language library and getting a Wireshark capture using the keylog to decrypt. I have used a Node script and it works great but it could be any language. This way we can compare the traffic and see if there is anything funny going on in tortoise land.

Tortoise change

The Tortoise change that was necessary was to get the CLIENT_RANDOM and what is called the "master secret" into a keylog file. This "master secret" is the generated asymmetric key that is used during encrypted communication after the handshake. In Wireshark you can add this keylog to the TLS preference and it will decrypt the application data layer. You can see the diff here.

Extra logging in IEx

Erlang's :ssl app actually uses the logger, so you can pass :logging_level as :debug in your options to get all the TLS/SSL traffic logged to your IEx console. However, this is very verbose and encrypted, so it is probably better to use Wireshark - however, I did find this information interesting.

mattludwigs commented 3 years ago

Update on some data points I compared the MQTT connect command between my working node script and the not working Elixir code.

Node decrypted

node-js-connect-good

Elixir decrypted

mqtt-connect-ex-bad

The only maybe meaningful difference is Node has the user name flag set (something handled outside my code). I am not sure that really matters though.

If I get more time later this week I will try to get set up to test my working Nerves device to see what the connect command looks like. This all might be down the wrong path as well but my working assumption is if the handshake works then the problem might be at the application layer - we will see I guess. If anyone has any input feel free to chime in, I am just chasing things down and trying to report data points for others if they start debugging.

mattludwigs commented 3 years ago

I forgot to report some information I found a while back. I found some information that suggested that Erlang can have some issues RSA private keys - which are provided from AWS's one-click certificate option. I am not sure about this as I dug through Erlang bug reports to try to see why this might be the case, and found nothing. I know that there has been a successful connection using EC keys. One thing to test is making EC keys and adding those to the AWS IoT account and testing that. However, if that works then there are some more questions that need to be answered. Again, this just from questionable online forum discussions that I am trying to recall from a few months back.

fhunleth commented 2 years ago

Some of the information here is incorrect. Hopefully updated public examples can be made, but here are the issues: