AES-CTR PRNG with external git submodule

Knogle commented 2 months ago

Ahoy,

Referring to https://github.com/martijnvanbrummelen/nwipe/pull/559#issuecomment-2336467026

I've implemented the OpenSSL development dependency using a git submodule

Everything else is just the same, even though, the build time is increased.

BR,

Knogle commented 2 months ago

Is there a reason you are using the openssl-3.0 branch rather than the openssl-3.4 branch?

Also where does the Openssl libraries that nwipe uses get stored on the distro assuming they are dynamically linked or is Openssl statically linked into nwipe's executable? If statically linked how big is the executable.

Ahoy, i went for the OpenSSL 3.0 version because in the other thread someone suggested it. OpenSSL is the LTS release and provides compatiblity for 5 years to come.

I've screwed a little up, still having an issue with the linked library here. The build time is increased to 8 minutes instead of 10 seconds. So tbh i think this is not a feasible solution. Also does it require if openssl is statically linked, also gcc to be statically linked. The size of the binary is increased to 11M instead of 962k.

So in conclusion i think, if we wish to do so for the project, having the libssl-dev headers on the build system, and using them, is the only feasible option. Or otherwise, it would mean a huge binary with it's downsides.

PartialVolume commented 2 months ago

You don't think extracting just the relevent AES code from Openssl we need to do the job is an option so we don't need to link to Openssl at all is an option?

I know that may be really difficult because you have to spend time understanding their code and it may be structured in such a way that makes it really difficult to do but we have done that in the past.

Knogle commented 2 months ago

You don't think extracting just the relevent AES code from Openssl we need to do the job is an option so we don't need to link to Openssl at all is an option?

I know that may be really difficult because you have to spend time understanding their code and it may be structured in such a way that makes it really difficult to do but we have done that in the past.

Ahoy :)

I think we could instead borrow code from one of the older OpenSSL versions like 1.1.0. The code in OpenSSL 3.x is structured as a kind of framework, and all the function deeply rely on each other as it's an integral part of the OpenSSL watchdog and error-reporting functions.

Maybe we can discuss this topic with the other guys, because i think it's also a discussion on what's best for the nwipe project, not only the single PR.

In case of OpenSSL 1.1.0 we could copy them out easier, but it's an outdated and known vulnerable version. I have created also a version which does not rely at all on OpenSSL, but is capped at around 50MB/s unfortunately.

BR,

/*
 * AES CTR PRNG Implementation
 * Author: Fabian Druschke
 * Date: 2024-03-13
 *
 * This header file contains definitions for the AES (Advanced Encryption Standard)
 * implementation in CTR (Counter) mode for pseudorandom number generation, utilizing
 * OpenSSL for cryptographic functions.
 *
 * As the author of this work, I, Fabian Druschke, hereby release this work into the public
 * domain. I dedicate any and all copyright interest in this work to the public domain,
 * making it free to use for anyone for any purpose without any conditions, unless such
 * conditions are required by law.
 *
 * This software is provided "as is", without warranty of any kind, express or implied,
 * including but not limited to the warranties of merchantability, fitness for a particular
 * purpose and noninfringement. In no event shall the authors be liable for any claim,
 * damages or other liability, whether in an action of contract, tort or otherwise, arising
 * from, out of or in connection with the software or the use or other dealings in the software.
 *
 * USAGE OF OPENSSL IN THIS SOFTWARE:
 * This software uses OpenSSL for cryptographic operations. Users are responsible for
 * ensuring compliance with OpenSSL's licensing terms.
 */

#include <stdint.h>
#include <string.h>
#include <assert.h>
#include <stdio.h>

// SHA-256 helper macros
#define ROTRIGHT(a,b) (((a) >> (b)) | ((a) << (32-(b))))

// SHA-256 specific functions as defined in the algorithm
#define CH(x,y,z) (((x) & (y)) ^ (~(x) & (z)))  // Choose
#define MAJ(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))  // Majority

// SHA-256 Sigma and Epsilon functions
#define EP0(x) (ROTRIGHT(x,2) ^ ROTRIGHT(x,13) ^ ROTRIGHT(x,22))  // Big Sigma 0
#define EP1(x) (ROTRIGHT(x,6) ^ ROTRIGHT(x,11) ^ ROTRIGHT(x,25))  // Big Sigma 1
#define SIG0(x) (ROTRIGHT(x,7) ^ ROTRIGHT(x,18) ^ ((x) >> 3))     // Small Sigma 0
#define SIG1(x) (ROTRIGHT(x,17) ^ ROTRIGHT(x,19) ^ ((x) >> 10))   // Small Sigma 1

#define AES_BLOCK_SIZE 16
#define AES_KEY_SIZE 32

/* SHA-256 Constants for the SHA-256 transformation. These values are used in each step of the hashing process.
   These constants are the first 32 bits of the fractional parts of the cube roots of the first 64 prime numbers. */
static const uint32_t k[64] = {
    0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
    0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
    0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
    0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
    0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
    0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
    0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
    0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
};

/* The SHA-256 context structure holds the current state of the SHA-256 computation.
   - data: Buffer to hold the current block of data being hashed (64 bytes).
   - datalen: Length of data currently in the buffer.
   - bitlen: Total length of the input data (in bits).
   - state: The state (hash values) that will hold the final output after all blocks are processed. */
typedef struct {
    uint8_t data[64];
    uint32_t datalen;
    uint64_t bitlen;
    uint32_t state[8];
} SHA256_CTX;

/* The sha256_transform function performs the main SHA-256 compression. It takes a 512-bit block of input
   data and processes it using the current state (hash values). The state is updated after this process. */
void sha256_transform(SHA256_CTX *ctx, const uint8_t data[]) {
    uint32_t a, b, c, d, e, f, g, h, i, j, t1, t2, m[64];

    /* Prepare the first 16 words (512 bits) of the message schedule array from the input data. */
    for (i = 0, j = 0; i < 16; ++i, j += 4)
        m[i] = (data[j] << 24) | (data[j + 1] << 16) | (data[j + 2] << 8) | (data[j + 3]);

    /* Expand the message schedule array for the remaining 48 words by applying
       the SHA-256-specific word expansion formulas. */
    for (; i < 64; ++i)
        m[i] = SIG1(m[i - 2]) + m[i - 7] + SIG0(m[i - 15]) + m[i - 16];

    /* Initialize the working variables with the current hash state. */
    a = ctx->state[0];
    b = ctx->state[1];
    c = ctx->state[2];
    d = ctx->state[3];
    e = ctx->state[4];
    f = ctx->state[5];
    g = ctx->state[6];
    h = ctx->state[7];

    /* Main SHA-256 transformation loop:
       - For each of the 64 rounds, update the working variables by performing the core SHA-256 operations.
       - The constants 'k[i]' and the prepared message schedule 'm[i]' are used in each round. */
    for (i = 0; i < 64; ++i) {
        t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
        t2 = EP0(a) + MAJ(a,b,c);
        h = g;
        g = f;
        f = e;
        e = d + t1;
        d = c;
        c = b;
        b = a;
        a = t1 + t2;
    }

    /* Update the final hash state with the results of the transformation. */
    ctx->state[0] += a;
    ctx->state[1] += b;
    ctx->state[2] += c;
    ctx->state[3] += d;
    ctx->state[4] += e;
    ctx->state[5] += f;
    ctx->state[6] += g;
    ctx->state[7] += h;
}

/* The sha256_init function initializes the SHA-256 context by setting the state variables
   to the initial hash values as defined in the SHA-256 specification. */
void sha256_init(SHA256_CTX *ctx) {
    ctx->datalen = 0;
    ctx->bitlen = 0;
    ctx->state[0] = 0x6a09e667;
    ctx->state[1] = 0xbb67ae85;
    ctx->state[2] = 0x3c6ef372;
    ctx->state[3] = 0xa54ff53a;
    ctx->state[4] = 0x510e527f;
    ctx->state[5] = 0x9b05688c;
    ctx->state[6] = 0x1f83d9ab;
    ctx->state[7] = 0x5be0cd19;
}

/* The sha256_update function processes chunks of input data by adding them to the context.
   It updates the state in 512-bit blocks (64 bytes), storing partial blocks until enough
   data is available for a full block. */
void sha256_update(SHA256_CTX *ctx, const uint8_t data[], size_t len) {
    for (size_t i = 0; i < len; ++i) {
        ctx->data[ctx->datalen] = data[i];
        ctx->datalen++;
        /* When the buffer is full (64 bytes), process it and reset the buffer. */
        if (ctx->datalen == 64) {
            sha256_transform(ctx, ctx->data);
            ctx->bitlen += 512;
            ctx->datalen = 0;
        }
    }
}

/* The sha256_final function pads the input data, processes the final block, and generates the final hash.
   It ensures the correct padding is applied, then processes the padded data block, and finally extracts
   the hash value from the context state. */
void sha256_final(SHA256_CTX *ctx, uint8_t hash[]) {
    uint32_t i = ctx->datalen;

    /* Pad the remaining data in the buffer with a single '1' bit, followed by enough '0' bits to make
       the total length of the final block 512 bits (64 bytes), except for the last 64 bits which store
       the length of the input data. */
    if (ctx->datalen < 56) {
        ctx->data[i++] = 0x80;  // Append the '1' bit
        while (i < 56)
            ctx->data[i++] = 0x00;  // Pad with '0's
    } else {
        /* If there's not enough room for the padding, append the '1' bit and '0's, then process the block. */
        ctx->data[i++] = 0x80;
        while (i < 64)
            ctx->data[i++] = 0x00;
        sha256_transform(ctx, ctx->data);  // Process the full block
        memset(ctx->data, 0, 56);  // Prepare for the next block
    }

    /* Append the length of the input data in bits, encoded as a 64-bit big-endian integer. */
    ctx->bitlen += ctx->datalen * 8;
    ctx->data[63] = ctx->bitlen;
    ctx->data[62] = ctx->bitlen >> 8;
    ctx->data[61] = ctx->bitlen >> 16;
    ctx->data[60] = ctx->bitlen >> 24;
    ctx->data[59] = ctx->bitlen >> 32;
    ctx->data[58] = ctx->bitlen >> 40;
    ctx->data[57] = ctx->bitlen >> 48;
    ctx->data[56] = ctx->bitlen >> 56;
    sha256_transform(ctx, ctx->data);  // Process the final block

    /* Copy the resulting hash from the state to the output buffer (hash). */
    for (i = 0; i < 4; ++i) {
        hash[i] = (ctx->state[0] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 4] = (ctx->state[1] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 8] = (ctx->state[2] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 12] = (ctx->state[3] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 16] = (ctx->state[4] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 20] = (ctx->state[5] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 24] = (ctx->state[6] >> (24 - i * 8)) & 0x000000ff;
        hash[i + 28] = (ctx->state[7] >> (24 - i * 8)) & 0x000000ff;
    }
}

/* The sha256 function is a wrapper that initializes the SHA-256 context, processes the input data,
   and produces the final SHA-256 hash. */
void sha256(const uint8_t* data, size_t len, uint8_t* hash) {
    SHA256_CTX ctx;
    sha256_init(&ctx);
    sha256_update(&ctx, data, len);
    sha256_final(&ctx, hash);
}

// --- AES-256 CTR Mode Implementation ---

/* AES-CTR PRNG State Structure holds the context for AES-256 encryption in CTR mode.
   - ivec: Initialization vector (counter for the CTR mode).
   - ecount: A counter to store the number of processed blocks.
   - num: Number of processed bytes in the current block.
   - key: The AES encryption key (256 bits). */
typedef struct {
    uint8_t ivec[AES_BLOCK_SIZE];  // Initialization vector
    uint8_t ecount[AES_BLOCK_SIZE];  // Encryption counter
    uint32_t num;  // Number of processed bytes in current block
    uint8_t key[AES_KEY_SIZE];  // AES encryption key
} aes_ctr_state_t;

// AES S-box
static const uint8_t sbox[256] = { /* S-box values */ };

// Rcon array for AES key expansion
static const uint8_t Rcon[11] = {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1B, 0x36};

/* RotWord function rotates the 4-byte word left by one byte.
   For example, [A0, A1, A2, A3] becomes [A1, A2, A3, A0].
   This function is used in the AES key expansion process. */
void RotWord(uint8_t* word) {
    uint8_t tmp = word[0];
    word[0] = word[1];
    word[1] = word[2];
    word[2] = word[3];
    word[3] = tmp;
}

/* SubWord function applies the AES S-box substitution to each byte in a 4-byte word.
   This is part of the AES key expansion process where each byte in the word is replaced
   by the corresponding value in the S-box. */
void SubWord(uint8_t* word) {
    for (int i = 0; i < 4; i++) {
        word[i] = sbox[word[i]];
    }
}

/* KeyExpansion generates the round keys for AES encryption. 
   It starts with the original 256-bit key and expands it to generate round keys for each encryption round.
   - key: The original 256-bit key (32 bytes).
   - roundKeys: The expanded round keys used in the AES encryption process. */
void KeyExpansion(uint8_t* key, uint8_t* roundKeys) {
    int i, j;
    uint8_t temp[4];

    /* Copy the original key into the first 32 bytes of the round keys. */
    for (i = 0; i < AES_KEY_SIZE; i++) {
        roundKeys[i] = key[i];
    }

    /* Generate the remaining round keys using the AES key expansion algorithm. */
    for (i = AES_KEY_SIZE / 4; i < 4 * (14 + 1); i++) {
        /* Copy the previous 4-byte word. */
        for (j = 0; j < 4; j++) {
            temp[j] = roundKeys[(i - 1) * 4 + j];
        }

        /* Every 8th word, apply the RotWord and SubWord transformations, and XOR with Rcon. */
        if (i % (AES_KEY_SIZE / 4) == 0) {
            RotWord(temp);
            SubWord(temp);
            temp[0] ^= Rcon[i / (AES_KEY_SIZE / 4)];
        }

        /* XOR the word with the word 32 bytes before it to produce the next round key word. */
        for (j = 0; j < 4; j++) {
            roundKeys[i * 4 + j] = roundKeys[(i - (AES_KEY_SIZE / 4)) * 4 + j] ^ temp[j];
        }
    }
}

/* AddRoundKey XORs the current state of the AES block with the current round key.
   This is a core part of the AES encryption process, performed in every encryption round. */
void AddRoundKey(uint8_t* state, uint8_t* roundKey) {
    for (int i = 0; i < AES_BLOCK_SIZE; i++) {
        state[i] ^= roundKey[i];
    }
}

/* AES_Encrypt encrypts a single 16-byte block of input using the AES-256 algorithm.
   It takes the input block, the output buffer, and the expanded round keys.
   - input: The 16-byte plaintext block to encrypt.
   - output: The encrypted ciphertext block (also 16 bytes).
   - key: The 256-bit encryption key. */
void AES_Encrypt(uint8_t* input, uint8_t* output, uint8_t* key) {
    uint8_t state[AES_BLOCK_SIZE];  // State array holds the intermediate results
    uint8_t roundKeys[240];  // Space for 14 rounds plus the initial key
    int round;

    /* Initialize the state with the input block. */
    memcpy(state, input, AES_BLOCK_SIZE);

    /* Generate the round keys from the input key. */
    KeyExpansion(key, roundKeys);

    /* Initial AddRoundKey step. */
    AddRoundKey(state, roundKeys);

    /* Main encryption rounds (SubBytes, ShiftRows, MixColumns are omitted here for simplicity). */
    for (round = 1; round < 14; round++) {
        // SubBytes(state);
        // ShiftRows(state);
        // MixColumns(state);
        AddRoundKey(state, roundKeys + round * AES_BLOCK_SIZE);
    }

    /* Final round (without MixColumns). */
    AddRoundKey(state, roundKeys + 14 * AES_BLOCK_SIZE);

    /* Copy the state to the output buffer. */
    memcpy(output, state, AES_BLOCK_SIZE);
}

/* increment_counter increases the AES-CTR counter by 1.
   It treats the counter as a 128-bit big-endian integer and increments it. */
void increment_counter(uint8_t* counter) {
    for (int i = AES_BLOCK_SIZE - 1; i >= 0; i--) {
        counter[i]++;
        if (counter[i] != 0) break;  // Stop when there's no overflow
    }
}

/* AES_CTR_Encrypt encrypts a variable-length input using AES-256 in Counter (CTR) mode.
   It generates the keystream by encrypting an incrementing counter and XORs it with the plaintext.
   - input: The input data to encrypt.
   - output: The output buffer where encrypted data will be written.
   - length: Length of the input data (in bytes).
   - key: The 256-bit AES key.
   - iv: The initialization vector (acts as the counter in CTR mode). */
void AES_CTR_Encrypt(uint8_t* input, uint8_t* output, uint32_t length, uint8_t* key, uint8_t* iv) {
    uint8_t counter[AES_BLOCK_SIZE];  // Counter for CTR mode
    uint8_t keystream[AES_BLOCK_SIZE];  // Buffer for the generated keystream
    uint32_t num_blocks = length / AES_BLOCK_SIZE;  // Number of full blocks
    uint32_t remaining_bytes = length % AES_BLOCK_SIZE;  // Remaining bytes after full blocks

    /* Initialize the counter with the IV (initialization vector). */
    memcpy(counter, iv, AES_BLOCK_SIZE);

    /* Process full 16-byte blocks. */
    for (uint32_t i = 0; i < num_blocks; i++) {
        /* Encrypt the counter to generate the keystream. */
        AES_Encrypt(counter, keystream, key);

        /* XOR the plaintext with the keystream to produce ciphertext. */
        for (int j = 0; j < AES_BLOCK_SIZE; j++) {
            output[i * AES_BLOCK_SIZE + j] = input[i * AES_BLOCK_SIZE + j] ^ keystream[j];
        }

        /* Increment the counter for the next block. */
        increment_counter(counter);
    }

    /* Process any remaining bytes (less than 16 bytes). */
    if (remaining_bytes > 0) {
        AES_Encrypt(counter, keystream, key);
        for (uint32_t j = 0; j < remaining_bytes; j++) {
            output[num_blocks * AES_BLOCK_SIZE + j] = input[num_blocks * AES_BLOCK_SIZE + j] ^ keystream[j];
        }
    }
}

/* aes_ctr_prng_init initializes the AES-CTR pseudorandom number generator.
   It takes a seed (init_key), hashes it with SHA-256 to derive a 256-bit AES key,
   and initializes the AES state.
   - state: The AES-CTR PRNG state.
   - init_key: The seed for key generation.
   - key_length: The length of the seed (in bytes). */
int aes_ctr_prng_init(aes_ctr_state_t* state, unsigned long init_key[], unsigned long key_length) {
    assert(state != NULL && init_key != NULL && key_length > 0);

    uint8_t key[32];  // Buffer for the derived 256-bit key
    memset(state->ivec, 0, AES_BLOCK_SIZE);  // Clear the IV (initial counter)
    state->num = 0;  // Reset the number of processed bytes
    memset(state->ecount, 0, AES_BLOCK_SIZE);  // Clear the encryption counter

    /* Hash the seed (init_key) with SHA-256 to generate the AES key. */
    sha256((const uint8_t*) init_key, key_length * sizeof(unsigned long), key);

    /* Copy the derived key into the state. */
    memcpy(state->key, key, AES_KEY_SIZE);
    return 0;
}

/* aes_ctr_prng_genrand_uint256_to_buf generates pseudorandom numbers using AES-CTR mode
   and writes them to the provided buffer.
   - state: The initialized AES-CTR PRNG state.
   - bufpos: The buffer where pseudorandom numbers will be written. */
int aes_ctr_prng_genrand_uint256_to_buf(aes_ctr_state_t* state, unsigned char* bufpos) {
    assert(state != NULL && bufpos != NULL);

    uint8_t temp_buffer[32];  // Temporary buffer for generated pseudorandom bytes
    memset(temp_buffer, 0, sizeof(temp_buffer));  // Clear the temporary buffer

    /* Encrypt the counter and generate the pseudorandom bytes. */
    AES_CTR_Encrypt(state->ivec, temp_buffer, sizeof(temp_buffer), state->key, state->ivec);

    /* Copy the generated bytes to the output buffer. */
    memcpy(bufpos, temp_buffer, sizeof(temp_buffer));

    return 0;
}

/* aes_ctr_prng_general_cleanup clears and resets the AES-CTR PRNG state.
   It securely wipes the key, IV, and any sensitive information from memory. */
int aes_ctr_prng_general_cleanup(aes_ctr_state_t* state) {
    if (state != NULL) {
        /* Clear sensitive information from memory. */
        memset(state->ivec, 0, AES_BLOCK_SIZE);
        memset(state->ecount, 0, AES_BLOCK_SIZE);
        memset(state->key, 0, AES_KEY_SIZE);
        state->num = 0;
    }
    return 0;
}

PartialVolume commented 2 months ago

I have created also a version which does not rely at all on OpenSSL, but is capped at around 50MB/s unfortunately.

If I'd spent time writing my own AES function I would spent some extra time on optimising the code, after all you must have already spent a considerable amount of time on it, so some extra time optimising may pay off.

Have you run any speed analysis on it to see if there is a particular section that's slow? Also maybe GCC or linker optimisations might help. Can any multiplications or divisions be replaced by C shift instructions? etc. A lot depends on what GCC is doing. But I would first profile the code to figure out where you can optimise it. I might have to dig out some old books that go into how the gaming world optimise their code for maximum speed.

PartialVolume commented 2 months ago

One of the first things I would do is look at rotword and subword functions. Is it really necessary to put these in function calls considering how tiny they are. Consider how many millions of function calls that would save by placing the code in line in the key_expansion function. I don't know if GCC is clever enough to figure out that optimisation but that's certainly one place I would start.

PartialVolume commented 2 months ago

Is this code in a branch that can be tested?

Knogle commented 2 months ago

Hey! Unfortunately not yet. But you can simply take the aes-ctr branch and replace all the content of the aes_ctr_prng.c It's a kind of retrofit.

It still has some issues with segfaults unfortunately, so i think it requires a little debugging before working properly. I got it working somehow, but had segfault during verify.

Knogle commented 2 months ago

Does it run for you? For me it runs, but sometimes really slow. Screenshot from 2024-09-09 18-13-26

PartialVolume commented 2 months ago

Does it run for you? For me it runs, but sometimes really slow.

I've not tried it yet. As you were getting segfaults on verify and intermittent performance issues I would say you have a bug in there that's corrupting memory/stack that's maybe out of bounds errors on loops hence the segfaults and intermittent slow performance.

Knogle commented 2 months ago

As we found out, the submodule approach is not feasible, so i will close the PR :)

martijnvanbrummelen / nwipe

AES-CTR PRNG with external git submodule #600