Implement High-Quality Random Number Generation Using AES-CTR Mode with OpenSSL and AES-NI Support

Knogle commented 8 months ago

In this pull request, I present my implementation of a pseudo-random number generator (PRNG) utilizing the AES-CTR (Advanced Encryption Standard - Counter mode) in 128-bit mode. This implementation is designed to produce high-quality random numbers, which are essential for a wide range of cryptographic applications. By integrating with the OpenSSL library and exploiting AES-NI (Advanced Encryption Standard New Instructions) hardware acceleration when available, I ensure both the security and efficiency of the random number generation process. It provides the highest-quality of PRNGs yet for NWIPE, and is a CSPRNG.

Key Features:

AES-CTR Mode: I chose AES in Counter mode due to its renowned capability to generate secure and unpredictable pseudo-random sequences. This mode operates by encrypting incrementing counter values, with the encryption output serving as the stream of random bytes.

128-bit AES: Utilizing a 128-bit key size for AES encryption provides a strong security measure while maintaining efficient performance, adhering to current cryptographic standards for pseudo-random number generation.

Integration with OpenSSL: OpenSSL, being a well-established and rigorously tested cryptographic library, is used to manage AES operations. This integration ensures a high level of security and performance for the AES-CTR operations within our PRNG.

Leveraging AES-NI Support: My implementation automatically detects and utilizes AES-NI, a set of instructions that enhance AES operations on most modern processors. This feature significantly improves the speed of random number generation, reducing CPU usage and enhancing scalability.

Implementation Details:

Initialization: At the outset, the PRNG's state is initialized with a distinct 128-bit key and an initial counter value, using OpenSSL's AES_set_encrypt_key to prepare the AES key structure for subsequent operations.

Generating Random Numbers: For generating random numbers, the current counter value is encrypted under the configured AES key in CTR mode. The output of this encryption serves as the source of pseudo-random bytes, with the counter incremented after each operation to maintain the uniqueness of subsequent inputs.

State Management: The PRNG's internal state, including the AES key, counter (IV), and encryption buffer (ecount), is securely managed within an aes_ctr_state_t structure. This careful management is crucial for preserving the integrity and unpredictability of the random number stream.

Optimizing for Hardware: By optimizing for AES-NI, my implementation ensures enhanced performance through hardware acceleration, providing an efficient solution for generating random numbers across various applications.

This PRNG implementation stands as a robust and efficient tool for generating high-quality pseudo-random numbers, crucial for cryptographic operations, secure communications, and randomized algorithms. The combination of AES-CTR mode, OpenSSL's reliability, and the performance benefits of AES-NI hardware acceleration results in a superior random number generator.

I have ensured that the implementation is well-documented with clear comments, making it accessible for review, understanding, and maintenance, following best practices in both software development and cryptographic standards.

I look forward to receiving feedback on this pull request to further improve and ensure the effectiveness of the PRNG implementation.

Test of randomness: 54e9585c-0218-4a40-be46-7911db900e0b

c860977f-8f4a-4015-ae21-1ae074824db6

Mean frequency per byte value: Approximately 100,289
Variance: Approximately 92,612
Standard deviation: Approximately 304

NIST Test Suite:

A total of 188 tests (some of the 15 tests actually consist of multiple sub-tests)
were conducted to evaluate the randomness of 32 bitstreams of 1048576 bits from:

    /dev/loop0

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The numerous empirical results of these tests were then interpreted with
an examination of the proportion of sequences that pass a statistical test
(proportion analysis) and the distribution of p-values to check for uniformity
(uniformity analysis). The results were the following:

    188/188 tests passed successfully both the analyses.
    0/188 tests did not pass successfully both the analyses.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Here are the results of the single tests:

 - The "Frequency" test passed both the analyses.

 - The "Block Frequency" test passed both the analyses.

 - The "Cumulative Sums" (forward) test passed both the analyses.
   The "Cumulative Sums" (backward) test passed both the analyses.

 - The "Runs" test passed both the analyses.

 - The "Longest Run of Ones" test passed both the analyses.

 - The "Binary Matrix Rank" test passed both the analyses.

 - The "Discrete Fourier Transform" test passed both the analyses.

 - 148/148 of the "Non-overlapping Template Matching" tests passed both the analyses.

 - The "Overlapping Template Matching" test passed both the analyses.

 - The "Maurer's Universal Statistical" test passed both the analyses.

 - The "Approximate Entropy" test passed both the analyses.

 - 8/8 of the "Random Excursions" tests passed both the analyses.

 - 18/18 of the "Random Excursions Variant" tests passed both the analyses.

 - The "Serial" (first) test passed both the analyses.
   The "Serial" (second) test passed both the analyses.

 - The "Linear Complexity" test passed both the analyses.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The missing tests (if any) were whether disabled manually by the user or disabled
at run time due to input size requirements not satisfied by this run.

SmallCrush Test:

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        ufile_CreateReadBin
 Number of statistics:  15
 Total CPU time:   00:00:06.46

 All tests were passed

Screenshot from 2024-03-24 03-35-36

PartialVolume commented 3 months ago

I think so, i've created a 32-bit VM and was able to build the code there. But isn't it i386 then? Not sure tbh.

If you are just building nwipe then yes I would have thought it built as 32bit in the 32bit VM, what does uname -a return?. Should say i586 or i686 but not x86_64

I don't remember whether you are building ShredOS from source but this is how I build a 32 bit ShredOS including all the applications including nwipe. For testing 32 bit I generally build a 32 bit ShredOS and also build nwipe on a 32 bit distro.

For ShredOS you Just select the correct architecture and variant in menu config and you can build 32 bit on a x86_64 host.

The way I change the architecture and architecture variant is shown below from x86_64 variant nocona to i386 variant i586

Screenshot_20240820_195710

Architecture options being:

ARC
ARC (big endian)
ARM (little endian)
ARM (big endian)  
AArch64 (little endian) 
AArch64 (big endian)
i386
m68k 
Microblaze AXI (little endian)
Microblaze non-AXI (big endian) 
MIPS (big endian)
MIPS (little endian)
MIPS64 (big endian)
MIPS64 (little endian)
OpenRISC
PowerPC 
PowerPC64 (big endian)
PowerPC64 (little endian)
RISCV 
s390x
SuperH
SPARC
SPARC64
x86_64 
Xtensa

x86_64 architecture variants are: ( I choose nocona for x86_64 so ShredOS runs on new and old processors back to Pentium 4 64bit ). There were Pentium 4 32bit and Pentium 4 64bit processors. I think all Intel processors before Pentium 4 were 32 bit.

x86-64
x86-64-v2
x86-64-v3
x86-64-v4
nocona
core2
corei7
nehalem
westmere
corei7-avx
sandybridge
ivybridge
haswell
broadwell
skylake
atom
bonnell
silvermont
goldmont
goldmont-plus
tremont
sierraforest
grandridge
knightslanding
knightsmill
skylake-avx512
cannonlake
icelake-client
icelake-server
cascadelake
cooperlake
tigerlake
sapphirerapids
alderlake
rocketlake
graniterapids
graniterapids-d
opteron
opteron w/ SSE3
barcelona
bobcat
jaguar
bulldozer
piledriver
steamroller
excavator
zen
zen 2
zen 3
zen 4

i386 (32 bit) variants: ( I select i586 which allows ShredOS to run on the first Pentium processors and everything after but not earlier processors like 486,386,286.

i486
i586
x1000
i686 pentium pro
pentium MMX
pentium mobile
pentium2
pentium3
pentium4 
prescott
nocona
core2
corei7
nehalem
westmere
corei7-avx
sandybridge
ivybridge
core-avx2
haswell
broadwell
skylake
atom
bonnell
silvermont
goldmont
goldmont-plus
tremont
sierraforest
grandridge
knightslanding
knightsmill
skylake-avx512
cannonlake
 icelake-client
cascadelake
cooperlake
tigerlake
sapphirerapids
alderlake
rocketlake
graniterapids
graniterapids-d
k6
k6-2
athlon
athlon-4
opteron
opteron w/ SSE3
barcelona
bobcat
jaguar
bulldozer
piledriver
steamroller
excavator
zen
zen 2
zen 3
zen 4
AMD Geode
Via/Cyrix C3 (Samuel/Ezra cores)
Via C3-2 (Nehemiah cores)
IDT Winchip C6

Knogle commented 3 months ago

Ahhhh alright, sound's great! Is there a way so i can try to build the current PR (this one) all together with ShredOS to test this?

I've build with Linux debian-i386 6.1.0-23-686-pae #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) i686 GNU/Linux

At least seems to run OK on i686.

Screenshot from 2024-08-20 23-15-02

But on these old CPUs without AES-Ni, Fibonacci is a lot lot faster.

Screenshot from 2024-08-20 23-15-54

PartialVolume commented 3 months ago

Yes, to build ShredOS with your modified version you need to do a release from your fork making sure you set the aes-ctr branch as the target. Then on your local copy of ShredOS You then need to edit a couple of files in packages/nwipe to change the sha1 hash to match your release and change the URL to point to your release. Then just rebuild ShredOS and it will pull in and compile your version of nwipe.

If you're unsure of anything let me know and I'll go into more detail.

Knogle commented 3 months ago

Yes, to build ShredOS with your modified version you need to do a release from your fork making sure you set the aes-ctr branch as the target. Then on your local copy of ShredOS You then need to edit a couple of files in packages/nwipe to change the sha1 hash to match your release and change the URL to point to your release. Then just rebuild ShredOS and it will pull in and compile your version of nwipe.

If you're unsure of anything let me know and I'll go into more detail.

Ahh thanks! Doing it this way, it worked really well now. Thanks for that. In the end i still forgot to edit the hash.

Currently wiping 4x 16TB drives in order to put them on eBay, running really well.

Knogle commented 3 months ago

Had to double check, but here we had the same issue. Depending on architecture the seed length was different, due to unsigned long, instead of uint64_t.

PartialVolume commented 3 months ago

@Knogle I've been thinking about whether to squash these 34 commits to a single commit, however I'm conflicted as to whether it's necessary or not in this case. Your commit comments are informative, however there are a few where you reverse a previous commit so the commit history would be tidier by squashing to a single commit.

I just wondered if you had a preference?

If I did squash the commits to a single commit would you want to do it in git and write the new commit comment for this branch or do you want me to do it in github by doing a merge squash and I write the new commit comment.?

Knogle commented 3 months ago

@Knogle I've been thinking about whether to squash these 34 commits to a single commit, however I'm conflicted as to whether it's necessary or not in this case. Your commit comments are informative, however there are a few where you reverse a previous commit so the commit history would be tidier by squashing to a single commit.

I just wondered if you had a preference?

If I did squash the commits to a single commit would you want to do it in git and write the new commit comment for this branch or do you want me to do it in github by doing a merge squash and I write the new commit comment.?

Hey, you could squash them by yourself if that's okay :)

PartialVolume commented 3 months ago

@Knogle I've been thinking about whether to squash these 34 commits to a single commit, however I'm conflicted as to whether it's necessary or not in this case. Your commit comments are informative, however there are a few where you reverse a previous commit so the commit history would be tidier by squashing to a single commit. I just wondered if you had a preference? If I did squash the commits to a single commit would you want to do it in git and write the new commit comment for this branch or do you want me to do it in github by doing a merge squash and I write the new commit comment.?

Hey, you could squash them by yourself if that's okay :)

yes, no problem.

Knogle commented 3 months ago

I think what's worth noting is 55472fb0e85ad5b9ea2ae9eed1f1b38f7508db8f, and fe493cfbbd76c3c8e03a7237138690d02682fe1e where AES-CTR is set as default option as well for AES-Ni enabled systems, otherwise falling back to Xoroshiro, and Lagged Fibonacci on i686.

PartialVolume commented 3 months ago

I think what's worth noting is 55472fb, where AES-CTR is set as default option as well for AES-Ni enabled systems, otherwise falling back to Xoroshiro, and Lagged Fibonacci on i686.

Noted, I will read through all the current commit comments and include information I think is important. My comment will probably lean towards being more verbose rather than succinct.

PartialVolume commented 3 months ago

@Knogle Can you also hold fire on producing any new branches based on your existing branch. I'm concerned I'm going to have quite a few merge conflicts to resolve when I come to merging your subsequent branches. So I'd like to get you existing work merged so you can then update your own fork before creating any new branches after all your existing PRs have been merged. Thanks.

Knogle commented 3 months ago

@Knogle Can you also hold fire on producing any new branches based on your existing branch. I'm concerned I'm going to have quite a few merge conflicts to resolve when I come to merging your subsequent branches. So I'd like to get you existing work merged so you can then update your own fork before creating any new branches after all your existing PRs have been merged. Thanks.

Sure, i will do so :)

Knogle commented 2 months ago

Currently conducting some tests on ARM :) Unfortunately my SD card is massively limiting.

PartialVolume commented 2 months ago

Yes, I've run 0.37 on arm, Ubuntu with xfce desktop on RPI-4 8GB RAM. Configured to boot via USB rather microsd. Seemed to work ok, I've not done any speed tests to see how fast a drive attached via USB will be subject to the drive limitations.

Knogle commented 2 months ago

@PartialVolume Maybe a consideration here, I currently have some time left so I can squash the commits. Regarding external libraries there are a few things we could do.

We could instead copy the relevant stuff out of the OpenSSL librarian and include it in our code without external dependency. The OpenSSL licence allows us to do so. Another approach which is more elegant. We include the stable OpenSSL version which works fine for as, as a submodule in the git project by specifying a specific release and tag/commit. This way we can have version locking, and always build with the same OpenSSL version, or altering the version only if we wish to do so. Second approach I have implemented in different projects already, and for different libraries, libmariadb in my case. Where I wanted the code to always function the same way.

Looks like this. Screenshot_20240908-001837_Firefox

PartialVolume commented 2 months ago

Another approach which is more elegant. We include the stable OpenSSL version which works fine for as, as a submodule in the git project by specifying a specific release and tag/commit. This way we can have version locking, and always build with the same OpenSSL version, or altering the version only if we wish to do so. Second approach I have implemented in different projects already, and for different libraries, libmariadb in my case. Where I wanted the code to always function the same way.

Yes the second approach, that could work for us.

Knogle commented 2 months ago

Another approach which is more elegant. We include the stable OpenSSL version which works fine for as, as a submodule in the git project by specifying a specific release and tag/commit. This way we can have version locking, and always build with the same OpenSSL version, or altering the version only if we wish to do so. Second approach I have implemented in different projects already, and for different libraries, libmariadb in my case. Where I wanted the code to always function the same way.

Yes the second approach, that could work for us.

Ahoy, Squashed the commits, and also created a second approach here, using the submodules. https://github.com/martijnvanbrummelen/nwipe/pull/600

Firminator commented 2 months ago

Would like to bring up the point of including the LTS version of OpenSSL which has support until September 2025. See https://openssl-library.org/policies/general/versioning-policy/#long-term-stable-release and https://github.com/openssl/openssl/discussions/23674#discussioncomment-8581050 That way we can be sure no regressions are introduced to the encryption-related functions provided by OpenSSL and used by nwipe the way Knogle proposed ("mapping it as a submodule in the git project by specifying a specific release and tag/commit"). I would weigh this more important than the speed increase of newer OpenSSL versions which might be very specific to certain hardware (like the one which was used in your tests). The speed gain on certain system seems to me less desirable than the encryption maturely functioning at all times on all hardware with no need to worry that jumping for example from OpenSSL 3.1 to 3.2 or 3.4 will introduce bugs or regressions.

PartialVolume commented 2 months ago

Would like to bring up the point of including the LTS version of OpenSSL which has support until September 2025. See https://openssl-library.org/policies/general/versioning-policy/#long-term-stable-release and openssl/openssl#23674 (comment) That way we can be sure no regressions are introduced to the encryption-related functions provided by OpenSSL and used by nwipe the way Knogle proposed ("mapping it as a submodule in the git project by specifying a specific release and tag/commit"). I would weigh this more important than the speed increase of newer OpenSSL versions which might be very specific to certain hardware (like the one which was used in your tests). The speed gain on certain system seems to me less desirable than the encryption maturely functioning at all times on all hardware with no need to worry that jumping for example from OpenSSL 3.1 to 3.2 or 3.4 will introduce bugs or regressions.

I still think we should be using our own AES builtin function just like all our other prngs and not using openssl at all. It's already taken up too much of my time so the only way I'm now going to merge this is if the function @knogle presented recently (that had a few bugs) is fixed. To be honest the function was almost there, just needs some debugging.

Knogle commented 2 months ago

Would like to bring up the point of including the LTS version of OpenSSL which has support until September 2025. See https://openssl-library.org/policies/general/versioning-policy/#long-term-stable-release and openssl/openssl#23674 (comment) That way we can be sure no regressions are introduced to the encryption-related functions provided by OpenSSL and used by nwipe the way Knogle proposed ("mapping it as a submodule in the git project by specifying a specific release and tag/commit"). I would weigh this more important than the speed increase of newer OpenSSL versions which might be very specific to certain hardware (like the one which was used in your tests). The speed gain on certain system seems to me less desirable than the encryption maturely functioning at all times on all hardware with no need to worry that jumping for example from OpenSSL 3.1 to 3.2 or 3.4 will introduce bugs or regressions.

I still think we should be using our own AES builtin function just like all our other prngs and not using openssl at all. It's already taken up too much of my time so the only way I'm now going to merge this is if the function @Knogle presented recently (that had a few bugs) is fixed. To be honest the function was almost there, just needs some debugging.

Thanks for your reply. Unfortunately i cannot follow up on this, i've shown my implementation to a security guy, and it has a lot of major flaws, and is not a proper implementation. According to him it's not an easy task to do so. So it would require deep knowledge regarding security best-practices etc., basically rebuilding parts of OpenSSL in our own way which would take up too much time. Also this one would be a little to tough for my math skills.

So i think then the only choice would be to pursue another PRNG then.

PartialVolume commented 2 months ago

According to him it's not an easy task to do so.

Things are always hard when you don't truly understand the subject. Once you understand something it all becomes surprisingly easy.

I've looked at the overall procedure and it doesn't look that complicated to code. Maybe time consuming to get it right but you can at least test it against the Openssl version if you coded the shifts the same way but I've written far more complex things. Prng is your area of expertise so it's entirely up to you.

Since AES is a symmetric key cipher, it uses the same secret key for both encryption and decryption. This means that both the sender and receiver of the data in question need a copy of the secret key. Symmetric keys are better suited for internal transfers, unlike asymmetric keys, which are best for external transfers. Symmetric key ciphers, such as AES, are faster and more efficient to run since they require less computational power than asymmetric key algorithms.

Additionally, AES uses block ciphers, where the plaintext is divided into sections called blocks. AES uses a 128-bit block size, whereby data is divided into 4-by-4 arrays that contain 16 bytes. Each byte contains 8 bits, with the total bits in every block being 128. In AES, the size of encrypted data remains the same. This means that 128 bits of plaintext yield 128 bits of ciphertext.

In all encryption, each unit of data is replaced by a different unit according to the security key used. AES is a substitution-permutation network that uses a key expansion process where the initial key is used to come up with new keys called round keys. The round keys are generated over multiple rounds of modification. Each round makes it harder to break the encryption. The AES-256 encryption uses 14 such rounds.

AES works by having the initial key added to a block using an exclusive or (XOR) cipher. This is an operation that is built into processor hardware. In the block, each byte of data is substituted with another, following a predetermined table. The rows of the 4-by-4 array are shifted, with the bytes in the second row being moved one space to the left. Bytes in the third row are moved two spaces, and the ones in the fourth row moved three spaces. The columns are then mixed, combining the four bytes in each column, and the round key is added to the block. The process is repeated for each round, yielding a ciphertext that is completely different from the plaintext.

This encryption algorithm features the following advantages:

Using a different key for every round yields a much more complex result Byte substitution modifies the data in a nonlinear way, thus hiding the relationship between plaintext and ciphertext. Shifting rows and mixing columns diffuses data, thus transposing bytes. This further complicates the encryption.

Knogle commented 2 months ago

What I could offer is the previously discussed submodule approach in the recent PR, (referenced here). This involves cloning a specific version of the OpenSSL repo, doing a minimal build during the nwipe build process (instead of full-build as done before) , and statically linking it. The result is almost the same, with only a slight increase in build time.

I know this has taken a lot of time for all of us, so maybe we can decide if it makes sense to go forward with this or explore other PRNGs that need less effort to implement. The custom approach might work, but I won't be pursuing it further. I have taken a look, and manually optimizing the assembly interaction with the EVP-API and prefetcher/branch-prediction is too complex for me to invest more time in. Basically the reason why I have chosen AES here, is hardware acceleration and speed. In order to accomplish this the same way, we would have to optimize the register stuff across platforms manually, and conduct testing which requires deep assembler and architectural knowledge of ARM, MISP, x86 etc.

PartialVolume commented 2 months ago

This involves cloning a specific version of the OpenSSL repo, doing a minimal build during the nwipe build process (instead of full-build as done before)

That may work as long as it doesn't increase the size of nwipe's binary too much.

Knogle commented 2 months ago

This involves cloning a specific version of the OpenSSL repo, doing a minimal build during the nwipe build process (instead of full-build as done before)

That may work as long as it doesn't increase the size of nwipe's binary too much.

I think i am on it! Currently i was able to do 5M total size of the binary for the full build of OpenSSL statically linked, i will no try the minimum approach including compression and stripping.

Knogle commented 2 months ago

Check this out :) https://github.com/martijnvanbrummelen/nwipe/pull/609

Firminator commented 2 months ago

If there is a way to not include OpenSSL at all (like Knogle suggested) I'm all for it. Less code in general and less dependencies on 3rd-party code is always a win in my mind. However if you guys were to decide to include OpenSLL for whatever reason then that's where I hoped for getting the TLS version included instead of the General Availibility releases v3.1.x, 3.2.x 3.3.x. Just thought I clarify this. @Knogle, thanks for having a 3rd pair of eyes checked out the encryption implementation.

Knogle commented 2 months ago

@Firminator That's an important decision to be made. If we go for implementing OpenSSL there are several use cases for it in nwipe, not only the PRNG, also seed verification and enrichment (Intel RDSEED Non-blocking in case of urandom failure) , better ways for stream verification (like checking PRNG entropy after wipe) , Signing PDFs, etc. I think OpenSSL is almost everywhere today (even in ShredOS, and one of the top 3 most implemented libraries. We could also implement a flag like ./configure -no-aes to build without AES and OpenSSL at all. So statically building it in would kind of remove the dependency as we basically include the cude that we need in a monolithic binary. But if that's not the scope for nwipe that's also okay.

PartialVolume commented 1 month ago

@knogle I've discussed the inclusion of openssl with @martijnvanbrummelen and for ease of security updates we would like to go with dynamic updates and not static. I don't know if this means you need to revert any commits in this or other PRs?

As we are relying upon a external library, it would be a good idea to validate some of the first blocks generated by the AES prng for randomness to be sure we abort if some future bug causes the openssl functions we are using to break and end up writing something that isn't random data. To make this easier we can just check randomness of a single block. It's really just to check that openssl hasn't failed to generate any random data so if you are generating a number to represent randomness it doesn't need to be too tight, simply that random data has been generated.

Knogle commented 1 month ago

@Knogle I've discussed the inclusion of openssl with @martijnvanbrummelen and for ease of security updates we would like to go with dynamic updates and not static. I don't know if this means you need to revert any commits in this or other PRs?

As we are relying upon a external library, it would be a good idea to validate some of the first blocks generated by the AES prng for randomness to be sure we abort if some future bug causes the openssl functions we are using to break and end up writing something that isn't random data. To make this easier we can just check randomness of a single block. It's really just to check that openssl hasn't failed to generate any random data so if you are generating a number to represent randomness it doesn't need to be too tight, simply that random data has been generated.

I will do so, I'll create a new PR for that :) it's getting a bit messy in this one already.

martijnvanbrummelen / nwipe

Implement High-Quality Random Number Generation Using AES-CTR Mode with OpenSSL and AES-NI Support #559