renesas / fsp

Flexible Software Package (FSP) for Renesas RA MCU Family
https://renesas.github.io/fsp/
Other
192 stars 82 forks source link

NX_CRYPTO_SIZE_ERROR during DTLS handshake with PSK-AES128-CCM8 cipher when using FSP's hardware encryption #288

Open hwmaier opened 1 year ago

hwmaier commented 1 year ago

FSP's (version v4.5.0) encryption routines fail when calling nx_secure_dtls_client_session_start() with NX_CRYPTO_SIZE_ERROR when using DTLS 1.2 with PSK-AES128-CCM8 cipher during the DTLS encryption handshake message phase.

Refer to Wireshark screenshot:

Snag_23385e6

and the error message in OpenSSL's log:

C:\Users\fsp\Projects\tests>openssl s_server -dtls -accept 1337 -nocert -psk deadbeef -cipher PSK-AES128-CCM8
Using default temp DH parameters
ACCEPT
-----BEGIN SSL SESSION PARAMETERS-----
MIGAAgEBAgMA/v0EAsCoBCBCxygwVawNBAj/QrXPmjFkoLfW9Htw2sQWWtGOXaJ0
cwQw0Z38yDE0IGrTyoPu+6Q4zKXtBrORCzsbYbhQVjvEZPTrZqJf4BVD4zSnshze
mT+VoQYCBGTf7ACiBAICHCCkBgQEAQAAAKgIBAZoZW5yaWs=
-----END SSL SESSION PARAMETERS-----
Shared ciphers:PSK-AES128-CCM8
CIPHER is PSK-AES128-CCM8
Secure Renegotiation IS supported
ERROR
17008:error:14191044:SSL routines:tls1_enc:internal error:ssl\record\ssl3_record.c:1066:
shutting down SSL
CONNECTION CLOSED

If the original NetX Secure encryption routines without hardware acceleration are used, nx_secure_dtls_client_session_start() with PSK-AES128-CCM8 has no problem to establish a connection. The issue arises when Renesas' alternative SCE9 based implementation from the rm_netx_secure_crypto directory is compiled in (SECURE_ALT_SRCS).

During the handshake the DTLS client and server exchange a handshake message with length of 40 bytes which after the nonce of 8 bytes and the icv of 8 bytes lengths each are deducted, results in an encrypted part of 24 bytes. The length of 24 bytes is causing the FSP code problems when decrypting.

I tracked the issue down to the file nx_crypto_ccm_alt_process.c. Function sce_nx_crypto_ccm_decrypt_update() fails In line 351 because the length passed to sce_nx_crypto_ccm_decrypt_update is 24 and the function does not seem to be able to handle a length which is not divisible by the block length of 16.

   if(length_remaining)
       return NX_CRYPTO_SIZE_ERROR;
   }

Tested against the OpenSSL test server using the following command line:

    openssl s_server -dtls -accept 1337 -nocert -psk deadbeef  -cipher PSK-AES128-CCM8

As a result we are unable to use the SCE9 based hardware encryption with DTLS 1.2 and PSK as it is not able to decrypt the 40 byte handshake message.

michaelthomasj commented 1 year ago

Hi Henrik, Thank you for notifying us of this issue. I'll look into it and get back to you with a workaround if any and an estimate for a fix. Regards,

hwmaier commented 1 year ago

@michaelthomasj Thank you. It almost looks like the code for padding the last block is missing in the implementation of the CCM decrypt routine. I had a look at FSP's mbedtls implementation of AES-CCM8 and that one does support arbitrary lengths. Unfortunately its not possible to use mbedtls with NetXDuo, so we depend on a working implementation for NetXDuo.

Also the issue would apply not only to DTLS but also to TLS or anything which has to decrypt data using AES128-CCM8.

michaelthomasj commented 1 year ago

Hi Henrik, We've looked at the issue and it seems that the NetX spec is not in sync with the NetX implementation. The spec states that all data should be 16-bit aligned : https://learn.microsoft.com/en-us/azure/rtos/netx/netx-crypto/chapter4#_nx_crypto_method_aes_operation:~:text=in%20the%20nx_crypto_method_aes_init.-,input_data,-Points%20to%20a But as you've observed, the code does allow sizes other than that. I'll ping the MSFT dev to clarify. Since we're doing hardware acceleration, we have to stick to the spec rather than the implementation in case of a discrepancy IMO.

hwmaier commented 1 year ago

@michaelthomasj Thank you for looking into this issue.

The spec states that all data should be 16-bit aligned :

We are not violating the spec because we are not using nx_crypto_method_aes_operation() directly and expect nx_secure_dtls_client_session_start() to do everything required to establish a DTLS session in a correct and compliant manner.

Since we're doing hardware acceleration, we have to stick to the spec rather than the implementation in case of a discrepancy IMO.

If Renesas follows that line of thought, how can we get DTLS to work?

It is not our application code which fails, it is nx_secure_dtls_client_session_start() which fails once hardware acceleration is compiled in. If we would log a bug with Microsoft, they certainly would argue its not their problem, its the silicon vendor's problem because the vendor implemented the hardware accelerated code.

Also I like to point out that sce_nx_crypto_ccm_encrypt_update() does deal with arbitrary length and has provisions in the code to do the padding. So Renesas didn't stick to the spec in that case.

So if sce_nx_crypto_ccm_encrypt_update() does padding and allows arbitrary length, why not the counterpart sce_nx_crypto_ccm_decrypt_update?

All we want is to get DTLS to work on the Rareness platform.

hwmaier commented 1 year ago

Microsofts spec is probably based on the standard AES modes like CBC which requires a 16-byte blocksize.

Looking at RFC4309 section 3.2 , AES-CCM does seem to not require plaintext padding and allows a variable payload size. So technically the blocksize restriction would not apply here if I am correct.

Microsoft's implementation seems to support AES-CCM's arbitrary payload sizes, in line with RFC4309.

hwmaier commented 1 year ago

I extracted our code into a smaller test program. It can be used to connect to the following OpenSSL test server:

openssl s_server -dtls -accept 1337 -nocert -psk deadbeef -psk_identity "matilda" -cipher PSK-AES128-CCM8

Here is the NetXDuo DTLS client program code which is unable to connect to above DTLS server using AES-CCM8 mode. The server IP address server_ip.nxd_ip_address.v4 = IP_ADDRESS(192, 168, 0, 15); needs to be adjusted to match what is used.

#include <stdalign.h>
#include <stdlib.h>
#include <stdint.h>

#include "tx_api.h"
#include "nx_api.h"
#include "nx_ether0.h"
#include "nx_crypto.h"
#include "nx_secure_x509.h"
#include "nx_secure_dtls_api.h"
#include "nx_secure_dtls.h"

/* Define the ThreadX and NetX object control blocks...  */
TX_THREAD main_thread;
NX_PACKET_POOL pool_0;
NX_IP ip_0;
NX_SECURE_DTLS_SESSION dtlsSession;

extern const NX_SECURE_TLS_CRYPTO  nx_crypto_tls_ciphers;

UCHAR mainStackMemory[TX_DEFAULT_STACK_SIZE] alignas(4);
UCHAR ipStackMemory[TX_DEFAULT_STACK_SIZE] alignas(4);
UCHAR arpCacheMemory[1024] alignas(4);
UCHAR cryptoMetaData[19000] alignas(4); // Cryptography routines and crypto work buffers
UCHAR dtlsPacketBuf[4000] alignas(4); // Packet reassembly buffer for decryption

#define REMOTE_CERT_SIZE (sizeof(NX_SECURE_X509_CERT) + 2000)
#define REMOTE_CERT_CNT 3
UCHAR remoteCertsBuf[REMOTE_CERT_SIZE * REMOTE_CERT_CNT] alignas(4); // Remote certificate buffer for incoming certificates
NX_SECURE_X509_CERT *identityCertPtr;

#define DTLS_IDENTITY "matilda"
#define DTLS_KEY "\xDE\xAD\xBE\xEF" // key up to 64 bytes in length.

#define UDP_PORT 1337

/**
 * Main thread
 */
static void main_thread_entry(ULONG thread_input)
{
    UINT rc;
    NX_UDP_SOCKET sock;
    NXD_ADDRESS server_ip;

    printf("Test program started.\n");

    // Wait until link is up
    tx_thread_sleep(3 * TX_TIMER_TICKS_PER_SECOND);

    // UDP socket must be created before creating DTLS session
    rc = nx_udp_socket_create(&ip_0, &sock, "dtls", NX_IP_NORMAL, NX_DONT_FRAGMENT, NX_IP_TIME_TO_LIVE, 8192);
    assert(rc == NX_SUCCESS);

    rc = nx_udp_socket_bind(&sock, UDP_PORT, NX_WAIT_FOREVER);
    assert(rc == NX_SUCCESS);

    /* Create a DTLS session for our socket. Ciphers and metadata defined
    elsewhere. See nx_secure_tls_session_create reference for more
    information. */
    rc = nx_secure_dtls_session_create(&dtlsSession,
                                        &nx_crypto_tls_ciphers,
                                        cryptoMetaData,
                                        sizeof(cryptoMetaData),
                                        dtlsPacketBuf,
                                        sizeof(dtlsPacketBuf),
                                        REMOTE_CERT_CNT,
                                        remoteCertsBuf,
                                        sizeof(remoteCertsBuf));
    assert(rc == NX_SUCCESS);

    // Set the PSK
    rc = nx_secure_dtls_psk_add(&dtlsSession, (UCHAR *)DTLS_KEY, sizeof(DTLS_KEY) - 1, (UCHAR *)DTLS_IDENTITY, sizeof(DTLS_IDENTITY) - 1, NX_NULL, 0);
    assert(rc == NX_SUCCESS);

    // Without this second call, the identity will not be transmitted and is empty. nx_secure_dtls_psk_add does not seem to send the identity.
    rc = nx_secure_tls_client_psk_set(&dtlsSession.nx_secure_dtls_tls_session, (UCHAR *)DTLS_KEY, sizeof(DTLS_KEY) - 1, (UCHAR *)DTLS_IDENTITY, sizeof(DTLS_IDENTITY) - 1, NX_NULL, 0);
    assert(rc == NX_SUCCESS);

    /* Add the certificate to the local store using a numeric ID. */
    nx_secure_dtls_session_trusted_certificate_add(&dtlsSession,
                                                   identityCertPtr, 1);

    /* Set up IP address of remote host. */
    server_ip.nxd_ip_version = NX_IP_VERSION_V4;
    server_ip.nxd_ip_address.v4 = IP_ADDRESS(192, 168, 0, 15);

    /* Now we can start the DTLS session as normal. */
    rc = nx_secure_dtls_client_session_start(&dtlsSession, &sock, &server_ip, UDP_PORT,
                                             NX_IP_PERIODIC_RATE);
    if (rc == NX_SUCCESS)
        printf("DTLS init OK.\n");
    else
        printf("Error connecting to DTLS server!\n");

    for (int i = 0;; i++)
    {
        printf(".");
        if (i % 80 == 0)
            printf("\n");
        fflush(stdout);

        tx_thread_sleep(TX_TIMER_TICKS_PER_SECOND);
    }
    printf("Test program terminated.\n");
}

/**
 * Application start-up
 *
 * @param first_unused_memory First-available unallocated RAM address for allocations
 */
void tx_application_define(void *first_unused_memory)
{
    UINT rc;

    /* Initialize the NetX system.  */
    nx_system_initialize();

    nx_crypto_initialize();

    /* Initialize the NetX Secure TLS/DTLS system.  */
    nx_secure_tls_initialize();

    /* Create a packet pool.  */
    rc = nx_packet_pool_create(&pool_0, "NetX Main Packet Pool", NX_ETH0_POOLPACKET_SIZE, NX_ETH0_POOL_MEMORY, sizeof(NX_ETH0_POOL_MEMORY));
    assert(rc == NX_SUCCESS);

    /* Create an IP instance.  */
    rc = nx_ip_create(&ip_0, "NetX IP Instance 0", DEFAULT_IP_ADDRESS, 0xFFFFFF00UL, &pool_0, NX_ETH0_DRIVER, ipStackMemory, sizeof(ipStackMemory), 1);
    assert(rc == NX_SUCCESS);

    /* Enable ARP and supply ARP cache memory for IP Instance 0.  */
    rc = nx_arp_enable(&ip_0, arpCacheMemory, sizeof(arpCacheMemory));
    assert(rc == NX_SUCCESS);

    /* Enable ICMP */
    rc = nx_icmp_enable(&ip_0);
    assert(rc == NX_SUCCESS);

    /* Enable UDP traffic.  */
    rc = nx_udp_enable(&ip_0);
    assert(rc == NX_SUCCESS);

    /* Enable TCP traffic.  */
    rc = nx_tcp_enable(&ip_0);
    assert(rc == NX_SUCCESS);

    /* Create the main thread.  */
    rc = tx_thread_create(&main_thread, "main", main_thread_entry, 0, mainStackMemory, sizeof(mainStackMemory), 4, 4, TX_NO_TIME_SLICE, TX_AUTO_START);
    assert(rc == NX_SUCCESS);
}

/**
 * Entry point. Launches ThreadX RTOS and won't return.
 */
int main(void)
{
    tx_kernel_enter();
}
renesas-brandon-hussey commented 8 months ago

This is being internally tracked using FSPRA-2353.