Closed Knogle closed 7 months ago
Still having build issues for Ubuntu/Debian, working on Fedora 38 and Fedora 39. Is it in a proper state now?
And your reminder. Just as a reminder for myself configure needs to check for openssl header files and libcrypt?
EDIT: I have posted regarding this issue on stackoverflow. https://stackoverflow.com/questions/78138011/compilation-fails-on-ubuntu-but-succeeds-on-fedora-for-project-using-openssl-fun
Adding to configure.ac solves the issue.
7d33e97e3ca59e1d0a0b323263f8405163d29a8d
Now compiles fine even on Ubuntu.
PKG_CHECK_MODULES( [OPENSSL], [openssl], [ CFLAGS="${CFLAGS} ${OPENSSL_CFLAGS}" LIBS="${LIBS} ${OPENSSL_LIBS}" ], [AC_CHECK_LIB([ssl], [SSL_library_init], [ LIBS="-lssl -lcrypto $LIBS" AC_CHECK_HEADERS(openssl/ssl.h,, [ AC_CHECK_HEADERS(openssl/crypto.h, [ AC_DEFINE([OPENSSL_IN_SUBDIR], [openssl/], [Look for openssl headers in subdir]) ], [AC_MSG_ERROR([openssl headers not found])]) ]) ], [AC_MSG_ERROR([OpenSSL development library not found])] )] )
I downloaded and compiled Knogle:master. All good, no compilation errors on Ubuntu 22.04.
Before I run the 2 workflows I thought I'd check the formatting.
I did a make check-format
and it showed formatting errors. Checking the files they were correctly formatted but the changes had not been staged for commit. Did you run make format and then push without staging the changed files and committing?
git status
On branch master
Your branch is up to date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: src/gui.c
modified: src/options.c
modified: src/prng.c
modified: src/prng.h
Untracked files:
(use "git add <file>..." to include in what will be committed)
config.guess
config.sub
no changes added to commit (use "git add" and/or "git commit -a")
Looks like you need to stage the changes and do a final commit to correct formatting.
git add src/gui.c
git add src/options.c
git add src/prng.c
git add src/prng.h
git commit -m "Correct formatting"
git push
Can you take a look at that thanks, the diff of unstaged changes is below, from what i can see it looks like they are only formatting changes.
git diff
diff --git a/src/gui.c b/src/gui.c
index e2316d2..22a3418 100644
--- a/src/gui.c
+++ b/src/gui.c
@@ -1739,18 +1739,13 @@ void nwipe_gui_prng( void )
tab1,
"Performs best on a 64-bit CPU. Use ISAAC if this system has a 32-bit CPU. " );
break;
- case 3:
+ case 3:
- mvwprintw( main_window,
- yy++,
- tab1,
- "AES-CTR Ni Prototype " );
+ mvwprintw( main_window, yy++, tab1, "AES-CTR Ni Prototype " );
break;
+ }
- }
-
-
- /* switch */
+ /* switch */
/* Add a border. */
box( main_window, 0, 0 );
@@ -1810,7 +1805,7 @@ void nwipe_gui_prng( void )
{
nwipe_options.prng = &nwipe_isaac64;
}
- if( focus == 3 )
+ if( focus == 3 )
{
nwipe_options.prng = &nwipe_aes_ctr_prng;
}
diff --git a/src/options.c b/src/options.c
index 8d556f7..a0c3d51 100644
--- a/src/options.c
+++ b/src/options.c
@@ -547,7 +547,6 @@ void nwipe_options_log( void )
extern nwipe_prng_t nwipe_isaac64;
extern nwipe_prng_t nwipe_aes_ctr_prng;
-
/**
* Prints a manifest of options to the log.
*/
diff --git a/src/prng.c b/src/prng.c
index 17e3ab2..79cde01 100644
--- a/src/prng.c
+++ b/src/prng.c
@@ -25,7 +25,7 @@
#include "mt19937ar-cok/mt19937ar-cok.h"
#include "isaac_rand/isaac_rand.h"
#include "isaac_rand/isaac64.h"
-#include "aes/aes_ctr_prng.h" // AES-NI prototype
+#include "aes/aes_ctr_prng.h" // AES-NI prototype
nwipe_prng_t nwipe_twister = { "Mersenne Twister (mt19937ar-cok)", nwipe_twister_init, nwipe_twister_read };
@@ -35,7 +35,6 @@ nwipe_prng_t nwipe_isaac64 = { "ISAAC-64 (isaac64.c)", nwipe_isaac64_init, nwipe
/* AES-CTR-NI PRNG Structure */
nwipe_prng_t nwipe_aes_ctr_prng = { "AES-CTR-PRNG", nwipe_aes_ctr_prng_init, nwipe_aes_ctr_prng_read };
-
/* Print given number of bytes from unsigned integer number to a byte stream buffer starting with low-endian. */
static inline void u32_to_buffer( u8* restrict buffer, u32 val, const int len )
{
@@ -256,7 +255,6 @@ int nwipe_isaac64_read( NWIPE_PRNG_READ_SIGNATURE )
return 0;
}
-
/* EXPERIMENTAL implementation of AES-128 in counter mode to provide high-quality random numbers */
int nwipe_aes_ctr_prng_init( NWIPE_PRNG_INIT_SIGNATURE )
@@ -266,9 +264,10 @@ int nwipe_aes_ctr_prng_init( NWIPE_PRNG_INIT_SIGNATURE )
if( *state == NULL )
{
/* This is the first time that we have been called. */
- *state = malloc( sizeof( aes_ctr_state_t ) );
+ *state = malloc( sizeof( aes_ctr_state_t ) );
}
- aes_ctr_prng_init( (aes_ctr_state_t*) *state, (unsigned long*) ( seed->s ), seed->length / sizeof( unsigned long ) );
+ aes_ctr_prng_init(
+ (aes_ctr_state_t*) *state, (unsigned long*) ( seed->s ), seed->length / sizeof( unsigned long ) );
return 0;
}
@@ -294,5 +293,3 @@ int nwipe_aes_ctr_prng_read( NWIPE_PRNG_READ_SIGNATURE )
return 0;
}
-
-
diff --git a/src/prng.h b/src/prng.h
index 7063aec..aa4ff12 100644
--- a/src/prng.h
+++ b/src/prng.h
@@ -14,30 +14,30 @@ typedef struct
#define NWIPE_PRNG_READ_SIGNATURE void **state, void *buffer, size_t count
/* Function pointers for PRNG actions. */
-typedef int (*nwipe_prng_init_t)(NWIPE_PRNG_INIT_SIGNATURE);
-typedef int (*nwipe_prng_read_t)(NWIPE_PRNG_READ_SIGNATURE);
+typedef int ( *nwipe_prng_init_t )( NWIPE_PRNG_INIT_SIGNATURE );
+typedef int ( *nwipe_prng_read_t )( NWIPE_PRNG_READ_SIGNATURE );
/* The generic PRNG definition. */
typedef struct
{
- const char* label; // The name of the pseudo random number generator.
- nwipe_prng_init_t init; // Initialize the prng state with the seed.
- nwipe_prng_read_t read; // Read data from the prng.
+ const char* label; // The name of the pseudo random number generator.
+ nwipe_prng_init_t init; // Initialize the prng state with the seed.
+ nwipe_prng_read_t read; // Read data from the prng.
} nwipe_prng_t;
/* Mersenne Twister prototypes. */
-int nwipe_twister_init(NWIPE_PRNG_INIT_SIGNATURE);
-int nwipe_twister_read(NWIPE_PRNG_READ_SIGNATURE);
+int nwipe_twister_init( NWIPE_PRNG_INIT_SIGNATURE );
+int nwipe_twister_read( NWIPE_PRNG_READ_SIGNATURE );
/* ISAAC prototypes. */
-int nwipe_isaac_init(NWIPE_PRNG_INIT_SIGNATURE);
-int nwipe_isaac_read(NWIPE_PRNG_READ_SIGNATURE);
-int nwipe_isaac64_init(NWIPE_PRNG_INIT_SIGNATURE);
-int nwipe_isaac64_read(NWIPE_PRNG_READ_SIGNATURE);
+int nwipe_isaac_init( NWIPE_PRNG_INIT_SIGNATURE );
+int nwipe_isaac_read( NWIPE_PRNG_READ_SIGNATURE );
+int nwipe_isaac64_init( NWIPE_PRNG_INIT_SIGNATURE );
+int nwipe_isaac64_read( NWIPE_PRNG_READ_SIGNATURE );
/* AES-CTR-NI prototypes. */
-int nwipe_aes_ctr_prng_init(NWIPE_PRNG_INIT_SIGNATURE);
-int nwipe_aes_ctr_prng_read(NWIPE_PRNG_READ_SIGNATURE);
+int nwipe_aes_ctr_prng_init( NWIPE_PRNG_INIT_SIGNATURE );
+int nwipe_aes_ctr_prng_read( NWIPE_PRNG_READ_SIGNATURE );
/* Size of the twister is not derived from the architecture, but it is strictly 4 bytes */
#define SIZE_OF_TWISTER 4
I will do so :)
Formatting should be fixed now, i've pushed the changes. I will also run the NIST suite, but the generated data should be NIST compliant.
EDIT: Passed the NIST STS Test suite
- The "Frequency" test passed both the analyses.
- The "Block Frequency" test passed both the analyses.
- The "Cumulative Sums" (forward) test passed both the analyses.
The "Cumulative Sums" (backward) test passed both the analyses.
- The "Runs" test passed both the analyses.
- The "Longest Run of Ones" test passed both the analyses.
- The "Binary Matrix Rank" test passed both the analyses.
- The "Discrete Fourier Transform" test passed both the analyses.
- 148/148 of the "Non-overlapping Template Matching" tests passed both the analyses.
- The "Overlapping Template Matching" test passed both the analyses.
- The "Maurer's Universal Statistical" test passed both the analyses.
- The "Approximate Entropy" test passed both the analyses.
- 8/8 of the "Random Excursions" tests passed both the analyses.
- 18/18 of the "Random Excursions Variant" tests passed both the analyses.
- The "Serial" (first) test passed both the analyses.
The "Serial" (second) test passed both the analyses.
- The "Linear Complexity" test passed both the analyses.
That's great, thanks. Workflow test pass. I'm having a little trouble running the NIST test. I downloaded and compiled the suite, but when I try to run assess I get various questions but then it segfaults. I guess I'm doing something wrong. I'm trying to test the randomness of a loop device /dev/loop39 filled with AES prng output and also a file, both produced by nwipe with AES prng. How do you run the NIST test?
sudo ./assess /dev/loop39
[sudo] password ....
G E N E R A T O R S E L E C T I O N
______________________________________
[0] Input File [1] Linear Congruential
[2] Quadratic Congruential I [3] Quadratic Congruential II
[4] Cubic Congruential [5] XOR
[6] Modular Exponentiation [7] Blum-Blum-Shub
[8] Micali-Schnorr [9] G Using SHA-1
Enter Choice: 0
User Prescribed Input File: /dev/loop39
S T A T I S T I C A L T E S T S
_________________________________
[01] Frequency [02] Block Frequency
[03] Cumulative Sums [04] Runs
[05] Longest Run of Ones [06] Rank
[07] Discrete Fourier Transform [08] Nonperiodic Template Matchings
[09] Overlapping Template Matchings [10] Universal Statistical
[11] Approximate Entropy [12] Random Excursions
[13] Random Excursions Variant [14] Serial
[15] Linear Complexity
INSTRUCTIONS
Enter 0 if you DO NOT want to apply all of the
statistical tests to each sequence and 1 if you DO.
Enter Choice: 1
P a r a m e t e r A d j u s t m e n t s
-----------------------------------------
[1] Block Frequency Test - block length(M): 128
[2] NonOverlapping Template Test - block length(m): 9
[3] Overlapping Template Test - block length(m): 9
[4] Approximate Entropy Test - block length(m): 10
[5] Serial Test - block length(m): 16
[6] Linear Complexity Test - block length(M): 500
Select Test (0 to continue): 0
How many bitstreams? 8
Input File Format:
[0] ASCII - A sequence of ASCII 0's and 1's
[1] Binary - Each byte in data file contains 8 bits of data
Select input mode: 1
Statistical Testing In Progress.........
Segmentation fault
I would like to leave this about a week before merging to give others a chance to comment if they wish and then I'll merge it if that's ok.
Have you done any speed tests against ISSAC64? It would be interesting to see a comparison using nwipe. If not I'll probably run a couple of wipes on 500GB drives using the different prngs as a comparison.
I would like to leave this about a week before merging to give others a chance to comment if they wish and then I'll merge it if that's ok.
Have you done any speed tests against ISSAC64? It would be interesting to see a comparison using nwipe. If not I'll probably run a couple of wipes on 500GB drives using the different prngs as a comparison.
Thanks a lot! Currently the code is considerably slower, but provides higher-quaility random numbers. My implementation still requires optimization, as those instructions should be able to do a 400-500MB/s on a single core. Currently it's around 100MB/s on a Broadwell CPU. During the next weeks, i will try to look for a way to improve the code. Most of the code was taken from Mersenne Twister, using it as a "structure", and filling in AES stuff instead of Mersenne. Also i am trying first steps to get started with pthreads, i think using all cores would also be a great benefit.
So i'd say it's still in a kind of experimental or testing state. Results may differ, depending on CPU architecture, and AES-Ni implementation. Implementing stuff i can, but code optimization i'm not that strong.
Ahhhhh alright, had the same issues using the test suite, so i have used the "improved" NIST test suite. https://github.com/arcetri/sts?tab=readme-ov-file
EDIT: How do you set up a loop device for testing? Can you tell me how you do that? I didn't get it working so i always had to use physical devices for testing.
Ahhhhh alright, had the same issues using the test suite, so i have used the "improved" NIST test suite. https://github.com/arcetri/sts?tab=readme-ov-file
EDIT: How do you set up a loop device for testing? Can you tell me how you do that? I didn't get it working so i always had to use physical devices for testing.
Thanks, I'll give that a try.
EDIT: How do you set up a loop device for testing? Can you tell me how you do that? I didn't get it working so i always had to use physical devices for testing.
# create a 1 MByte file that is our pseudo drive truncate -s 1M loopbackfile.img # assign the file loopback file.img as the next available loop device losetup -fP loopbackfile.img
You then use losetup to list which file is assigned to which loop device.i don't remember the syntax but I wrote a bash script that automates creation of a load of loop devices.
Nwipe won't normally show a loop device unless you explicitly list it on the command line
nwipe /dev/loop1 /dev/loop2
Tomorrow, when I'm back on my laptop, I'll post one of the bash scripts I use that automates creation of the loop devices. It's very simple but will give you an idea creating your own.
Can you maybe helping me in optimizing the code? I think it might be useful, prior to merging it. As i didn't understand some parts of the existing code, my implementation was almost purely orientated on the existing Mersenne Twister implementation regarding the function structure.
As shown here, we are generating a 128Bit number, but cropping it down do 32Bit, so we could achieve already 4 times the performance here.
`unsigned long aes_ctr_prng_genrand_uint32( aes_ctr_state_t* state ) { unsigned long result = 0;
CRYPTO_ctr128_encrypt( (unsigned char*) &result,
(unsigned char*) &result,
sizeof( result ),
&state->aes_key,
state->ivec,
state->ecount,
&state->num,
(block128_f) AES_encrypt );
next_state( state ); // Ensure this function does not cause errors
return result & 0xFFFFFFFF;
}`
The problem is. I don't know how to adapt the existing u64_to_buffer and u32_to_buffer functions in order to be able to cope with the 128bit value. Or maybe i have to split the 128bit value in 4 and use 4x u32_to_buffer in prng.c. Maybe you can give me some ideas.
I have made changes to the code now, improving performance massively already. There is more potential for optimization, but i think it's not that bad now.
I believe it would be more efficient to develop a function like u128_to_buffer
rather than relying on splitting and utilizing u64_to_buffer
. This approach, while necessitating a bit more effort, should streamline the process and enhance overall performance.
EDIT: Currently trying a variant using SSE intrinsics, easier to handle 128-bit values this way. EDIT2: My newly created loop-setup script, maybe it serves you as well. https://gist.github.com/Knogle/e690b8134637abe9fa473cc9eae58f98
/* Loop to fill the buffer with 128-bit blocks */
for(size_t ii = 0; ii < words; ++ii) {
aes_ctr_prng_genrand_uint128((aes_ctr_state_t*)*state, output);
// Using SSE to load and store the 128-bit output directly
__m128i data = _mm_loadu_si128((__m128i*)output);
_mm_storeu_si128((__m128i*)bufpos, data);
bufpos += 16; // Move to the next block
}
Instead of this:
/* Loop to fill the buffer with 128-bit blocks */
for( size_t ii = 0; ii < words; ++ii )
{
aes_ctr_prng_genrand_uint128( (aes_ctr_state_t*) *state, output );
// Extracting 64-bit parts from the 128-bit output
uint64_t part1 = ( (uint64_t) output[0] << 56 ) | ( (uint64_t) output[1] << 48 )
| ( (uint64_t) output[2] << 40 ) | ( (uint64_t) output[3] << 32 ) | ( (uint64_t) output[4] << 24 )
| ( (uint64_t) output[5] << 16 ) | ( (uint64_t) output[6] << 8 ) | (uint64_t) output[7];
uint64_t part2 = ( (uint64_t) output[8] << 56 ) | ( (uint64_t) output[9] << 48 )
| ( (uint64_t) output[10] << 40 ) | ( (uint64_t) output[11] << 32 ) | ( (uint64_t) output[12] << 24 )
| ( (uint64_t) output[13] << 16 ) | ( (uint64_t) output[14] << 8 ) | (uint64_t) output[15];
// Writing the 64-bit parts to the buffer
u64_to_buffer( bufpos, part1, 8 ); // Write the first 64-bit part
bufpos += SIZE_OF_AES_CTR_PRNG / 2; // Advance the buffer position
u64_to_buffer( bufpos, part2, 8 ); // Write the second 64-bit part
bufpos += SIZE_OF_AES_CTR_PRNG / 2; // Advance the buffer position
}
Apologies for the multiple commits – my thought process isn't always linear, and sometimes an idea strikes unexpectedly, compelling me to test it immediately. I've now significantly streamlined the code, reducing it to approximately 100 lines in total.
Notably, I've optimized the implementation by having CRYPTO_ctr128_encrypt
write directly into the output buffer, eliminating the need for an intermediary output array. This approach not only simplifies the code but also allows us to omit three includes that were only necessary for SSE2 implementation, making the overall solution more efficient.
Here's the essence of the change:
In prng.c
, the function nwipe_aes_ctr_prng_read
is structured to fill the buffer with 128-bit blocks directly, thus:
int nwipe_aes_ctr_prng_read( NWIPE_PRNG_READ_SIGNATURE ) {
u8* restrict bufpos = buffer;
size_t words = count / SIZE_OF_AES_CTR_PRNG;
// Fill the buffer with 128-bit blocks directly
for( size_t ii = 0; ii < words; ++ii ) {
aes_ctr_prng_genrand_uint128_to_buf( (aes_ctr_state_t*) *state, bufpos );
bufpos += 16; // Advance to the next block
}
// Handle any remaining bytes
const size_t remain = count % SIZE_OF_AES_CTR_PRNG;
if( remain > 0 ) {
unsigned char temp_output[16]; // Temporary storage for the last block
aes_ctr_prng_genrand_uint128_to_buf( (aes_ctr_state_t*) *state, temp_output );
memcpy( bufpos, temp_output, remain ); // Copy the remaining bytes
}
return 0; // Indicate success
}
And in src/aes/aes_ctr_prng.c
, the function aes_ctr_prng_genrand_uint128_to_buf
now directly initializes bufpos
with random data, forgoing a separate output buffer:
void aes_ctr_prng_genrand_uint128_to_buf(aes_ctr_state_t* state, unsigned char* bufpos) {
// Directly populate bufpos with random data, avoiding an extra output buffer
CRYPTO_ctr128_encrypt(bufpos, bufpos, 16, &state->aes_key, state->ivec, state->ecount, &state->num, (block128_f) AES_encrypt);
// Ensure the state transitions smoothly without errors
next_state(state);
}
Adjusting the code further inside of prng.c
seems unnecessary at this point. I'll try to look for a way to implement pthreads.
EDIT2: My newly created loop-setup script, maybe it serves you as well. https://gist.github.com/Knogle/e690b8134637abe9fa473cc9eae58f98
Looks good. My script isn't as clean as that but it does create multiple loop devices. I like to run wipes on 20 or more drives simultaneously, so if your script had the feature that you could specify the number of loop devices you want it would fit my use case perfectly.
For reference, ShredOS will automatically create loop devices for developers by putting nwipe_options="--nousb /dev/loop0 /dev/loop1 /dev/loop2" on the kernel command line which will automatically create alternating loop device sizes, 500k then 1M then 500K .. etc so all wipes don't finish simultaneously. I think it allows upto 20 or 30 loop devices. On ShredOS those loop drives wipe incredibly fast as they are on the RAM disk. So to make the wipes last longer than a split second I set nwipe to do 500 rounds or more which gives it a good test.
Ahhhh regarding the NIST test, had the same issues, so i used the "improved" NIST test suite.
EDIT2: My newly created loop-setup script, maybe it serves you as well. https://gist.github.com/Knogle/e690b8134637abe9fa473cc9eae58f98
Looks good. My script isn't as clean as that but it does create multiple loop devices. I like to run wipes on 20 or more drives simultaneously, so if your script had the feature that you could specify the number of loop devices you want it would fit my use case perfectly.
For reference, ShredOS will automatically create loop devices for developers by putting nwipe_options="--nousb /dev/loop0 /dev/loop1 /dev/loop2" on the kernel command line which will automatically create alternating loop device sizes, 500k then 1M then 500K .. etc so all wipes don't finish simultaneously. I think it allows upto 20 or 30 loop devices. On ShredOS those loop drives wipe incredibly fast as they are on the RAM disk. So to make the wipes last longer than a split second I set nwipe to do 500 rounds or more which gives it a good test.
Ahhh alright! Understood.
I've modified the script to do so.
https://gist.github.com/Knogle/64f57375ce7fe0e7efcd8a379a5ed493
You can use ./mount-loop.sh multimount 1M 20
I've modified the script to do so. https://gist.github.com/Knogle/64f57375ce7fe0e7efcd8a379a5ed493 You can use
./mount-loop.sh multimount 1M 20
That's great, thanks I'll give that a try.
Currently doing some attemps using pthreads, but the perfomance is crappy. Anyone with pthread experience here reading this? All cores are on 100%, and only making around 10MB/s lol
/* EXPERIMENTAL implementation of AES-128 in counter mode to provide high-quality random numbers */
// Diese Struktur wird für die Thread-Argumente verwendet
typedef struct {
aes_ctr_state_t* state;
u8* buffer;
size_t start;
size_t end;
} prng_thread_arg_t;
void* nwipe_aes_ctr_prng_read_thread(void* arg) {
prng_thread_arg_t* thread_arg = (prng_thread_arg_t*)arg;
aes_ctr_state_t* state = thread_arg->state;
u8* buffer = thread_arg->buffer + thread_arg->start;
size_t words = (thread_arg->end - thread_arg->start) / SIZE_OF_AES_CTR_PRNG;
for (size_t ii = 0; ii < words; ++ii) {
aes_ctr_prng_genrand_uint128_to_buf(state, buffer);
buffer += SIZE_OF_AES_CTR_PRNG; // Move to the next block
}
return NULL;
}
int nwipe_aes_ctr_prng_init( NWIPE_PRNG_INIT_SIGNATURE )
{
nwipe_log( NWIPE_LOG_NOTICE, "Initialising AES CTR PRNG" );
if( *state == NULL )
{
/* This is the first time that we have been called. */
*state = malloc( sizeof( aes_ctr_state_t ) );
}
aes_ctr_prng_init(
(aes_ctr_state_t*) *state, (unsigned long*) ( seed->s ), seed->length / sizeof( unsigned long ) );
return 0;
}
// Anpassung von nwipe_aes_ctr_prng_read, um Parallelisierung zu nutzen
int nwipe_aes_ctr_prng_read(NWIPE_PRNG_READ_SIGNATURE) {
int num_threads = 8; // Anzahl der Threads kann basierend auf Anforderungen angepasst werden
pthread_t threads[num_threads];
prng_thread_arg_t thread_args[num_threads];
size_t total_words = count / SIZE_OF_AES_CTR_PRNG;
size_t words_per_thread = total_words / num_threads;
for (int i = 0; i < num_threads; i++) {
size_t start = i * words_per_thread * SIZE_OF_AES_CTR_PRNG;
size_t end = (i + 1) * words_per_thread * SIZE_OF_AES_CTR_PRNG;
if (i == num_threads - 1) {
end = total_words * SIZE_OF_AES_CTR_PRNG; // Stellen Sie sicher, dass wir das Ende korrekt behandeln
}
thread_args[i].state = (aes_ctr_state_t*)*state;
thread_args[i].buffer = buffer;
thread_args[i].start = start;
thread_args[i].end = end;
pthread_create(&threads[i], NULL, nwipe_aes_ctr_prng_read_thread, &thread_args[i]);
}
// Warten auf die Beendigung aller Threads
for (int i = 0; i < num_threads; i++) {
pthread_join(threads[i], NULL);
}
// Behandlung der verbleibenden Bytes
size_t remaining_bytes = count % SIZE_OF_AES_CTR_PRNG;
if (remaining_bytes > 0) {
unsigned char temp_output[SIZE_OF_AES_CTR_PRNG];
aes_ctr_prng_genrand_uint128_to_buf((aes_ctr_state_t*)*state, temp_output);
memcpy(buffer + total_words * SIZE_OF_AES_CTR_PRNG, temp_output, remaining_bytes);
}
return 0; // Erfolg
}
Correct me if I've misunderstood the code but it looks like you are creating 8 threads (why 8 threads?) and generate the prng using eight threads?
I think you have to consider the sequence of events of what's happening outside of prng generation and where the delay is caused by a prng Vs zero write
So, I believe the delay caused by prng can be completely removed so that a prng wipe is no slower than a zeros wipe. The prng will always take x amount of time to generate a block of prng for writing to disc, currently that is sequential, the wipe code has to wait for the prng to be generated. The way I imagined getting rid of this delay is to start 1 prng thread for each wipe, only one pthread_create at the start of the wipe and one pthread_join on completion of the wipe. Then the prng thread fills a buffer say 10 blocks or maybe a 100, to be decided, then the prng thread pauses when the buffer is full, waiting for a request from the wipe thread for a pointer, the prng thread provides a updated pointer to the wipe thread. From the wipes thread point of view the prng is immediately available, so no delays as the prng thread created the next block before it was needed. The prng thread must never let it's buffer drop to zero blocks available as that would reintroduce a delay so as soon as it's buffer has a free block it must fill it. There would need to be a sequence counter that tracks which is the next prng block and it's address that is passed to to wipe thread. Communication between the wipe thread and the prng thread would be done via the wipe context structure for that particular drive. It's probably a good idea to check whether the openssl library is thread safe. It maybe worth looking at this https://www.openssl.org/docs/man1.0.2/man3/threads.html#:~:text=OpenSSL%20can%20generally%20be%20used,resources%20have%20the%20necessary%20locks.
Each wipe thread would have it's own single prng thread that was creating prng blocks ahead of them actually being needed.
This would be a fairly complex bit of code and makes substantial changes to the wipe threads which means plenty of testing, so I would prefer such changes to be a separate pull request from your existing pull request, i.e create a branch to work on that rather than continue to make changes to your master branch.
I hope that makes sense, please let me know if that needs clarification.
Assuming I have understood your code correctly, it looks like you have tried to split the generation of a block of PRNG into 8 threads, but each time you do that you create the thread [do some work] then join each thread wait for completion. That's a lot of phthread I/O overhead. Joining a thread takes time and you are doing this 8 times for every block of prng.
Using the method I described earlier you would only need one pthread create and one pthread join. The prng thread would be created at the start of the wipe and terminated pthread_join once the wipe has completely finished. Again, assuming I've understood your code correctly, there maybe over 8 million pthread_joins issued on a 500GB drive. That's a lot of unnecessary overhead which might be why it's so slow.
Updated my original comment with details of openssl and thread safety. Openssl-thread-safety
Thanks a lot! Makes sense to me. I'll leave the pthread topic for a dedicated pull request then :)
I've performed a final analysis of the PRNG stream. The data seems to be OK. I am also able to achieve around 550MB/s on a Ryzen 5000 platform which seems to be not that bad for the first attempt of an implementation. I guess we can get more out of it , using pthreads or OpenMP in a dedicated PR.
Looks good, I think we need to some 20-40 drive simultaneous wipes AES prng, method prng, no blanking pass as I concerned whether it's thread safe.
I need to double check the code in regards to thread safety, but I think for a 40 drive erase there will 40 wipe threads, each wipe thread will be calling the openssl code. I believe in that openssl link I posted earlier , it discussed the openssl locking mechanism. At the moment I don't think you are using that so it could trigger a issue if the openssl library is called from different threads simultaneously. How that shows could be a crash/hang or false verification failure when wiping many drives simultaneously.
@ggruber has a server that would very useful for testing the thread safety of AES prng with a large number of drives with reasonably large capacity >500GB. @ggruber would that be ok with you?
Sounds great!
Regarding the current single-threaded implementation i've encountered an issue, i think it might be important to sort this out before merging. When choosing 'Do not blank bla bla', the PRNG fills everything, but shows an error after termination. Interestingly, the error count is always the same. I am suspecting the issue in my prng.c implementation. I've tried zero-filling in aes_ctr_prng.c for debugging, and it leads into the same issue, when choosing to not do a blanking pass.
When choosing 'Do not blank bla bla', the PRNG fills everything, but shows an error after termination. Interestingly, the error count is always the same.
That's because with verification set to last pass and blanking pass 'on' the verification is conducted on a zeros pass.
With verification set as last pass and blanking pass 'off', verification is performed on the PRNG stream.
If you switch blank pass on and verification to all passes you will probably find it also fails with 25k errors.
It looks like every verification pass is failing so I'd look at the value of the original seed which must be the same for verification as it was for the original prng wipe. If that seed value is not the same you will get every prng block fail verification which seems to be what you are getting.
When choosing 'Do not blank bla bla', the PRNG fills everything, but shows an error after termination. Interestingly, the error count is always the same.
That's because with verification set to last pass and blanking pass 'on' the verification is conducted on a zeros pass.
With verification set as last pass and blanking pass 'off', verification is performed on the PRNG stream.
If you switch blank pass on and verification to all passes you will probably find it also fails with 25k errors.
It looks like every verification pass is failing so I'd look at the value of the original seed which must be the same for verification as it was for the original prng wipe. If that seed value is not the same you will get every prng block fail verification which seems to be what you are getting.
Hmmm okay, that makes sense. Hmmm i'm thinking on a way to fix that. Have you got some approach in mind?
So i'd have to store the generated encryption key somewhere. And as the key is only derivated from unsigned long init_key[]
we need some way to make sure, the verfiying instance get's the key it needs in order to produce the same result. My issue is also, i don't know where the verification process takes place inside the codebase.
The way it's done for the other prng's is to store the initialisation seed in the drive context.h c->prng_seed and c->prng_state
nwipe_entropy_t prng_seed; // The random data that is used to seed the PRNG.
void* prng_state; // The private internal state of the PRNG.
So the seed is initialised for each drive context in pass.c at line 279
/* Seed the PRNG. */
c->prng->init( &c->prng_state, &c->prng_seed );
So it would be nice to retain that same code, however I guess that depends on the size of the seed.
Also these prng seeds and variables that are stored in each drives context (structure) are initialised when nwipe first starts. Line 439-448 in nwipe.c
/* Initialise some of the variables in the drive contexts
*/
for( i = 0; i < nwipe_enumerated; i++ )
{
/* Set the PRNG implementation, which must always come after the function nwipe_gui_select ! */
c1[i]->prng = nwipe_options.prng;
c1[i]->prng_seed.length = 0;
c1[i]->prng_seed.s = 0;
c1[i]->prng_state = 0;
...
...
...
If you retain the existing method of storing the seed, seed length etc in the existing context then verification should start working without needing to change any of the verification code.
You also will want to look at the random verification pass at line 35 in pass.c, in particular the initial seed generation prior to the while loop
/* Reseed the PRNG. */
c->prng->init( &c->prng_state, &c->prng_seed );
and the generation of the next prng block..
/* Fill the output buffer with the random pattern. */
c->prng->read( &c->prng_state, d, blocksize );
By using the existing prng context hopefully you shouldn't need to change anything in this code.
int nwipe_random_verify( nwipe_context_t* c )
{
/**
* Verifies that a random pass was correctly written to the device.
*
*/
/* The result holder. */
int r;
/* The IO size. */
size_t blocksize;
/* The result buffer for calls to lseek. */
off64_t offset;
/* The input buffer. */
char* b;
/* The pattern buffer that is used to check the input buffer. */
char* d;
/* The number of bytes remaining in the pass. */
u64 z = c->device_size;
if( c->prng_seed.s == NULL )
{
nwipe_log( NWIPE_LOG_SANITY, "Null seed pointer." );
return -1;
}
if( c->prng_seed.length <= 0 )
{
nwipe_log( NWIPE_LOG_SANITY, "The entropy length member is %i.", c->prng_seed.length );
return -1;
}
/* Create the input buffer. */
b = malloc( c->device_stat.st_blksize );
/* Check the memory allocation. */
if( !b )
{
nwipe_perror( errno, __FUNCTION__, "malloc" );
nwipe_log( NWIPE_LOG_FATAL, "Unable to allocate memory for the input buffer." );
return -1;
}
/* Create the pattern buffer */
d = malloc( c->device_stat.st_blksize );
/* Check the memory allocation. */
if( !d )
{
nwipe_perror( errno, __FUNCTION__, "malloc" );
nwipe_log( NWIPE_LOG_FATAL, "Unable to allocate memory for the pattern buffer." );
free( b );
return -1;
}
/* Reset the file pointer. */
offset = lseek( c->device_fd, 0, SEEK_SET );
/* Reset the pass byte counter. */
c->pass_done = 0;
if( offset == (off64_t) -1 )
{
nwipe_perror( errno, __FUNCTION__, "lseek" );
nwipe_log( NWIPE_LOG_FATAL, "Unable to reset the '%s' file offset.", c->device_name );
free( b );
free( d );
return -1;
}
if( offset != 0 )
{
/* This is system insanity. */
nwipe_log( NWIPE_LOG_SANITY, "lseek() returned a bogus offset on '%s'.", c->device_name );
free( b );
free( d );
return -1;
}
/* Tell our parent that we are syncing the device. */
c->sync_status = 1;
/* Sync the device. */
r = fdatasync( c->device_fd );
/* Tell our parent that we have finished syncing the device. */
c->sync_status = 0;
if( r != 0 )
{
nwipe_perror( errno, __FUNCTION__, "fdatasync" );
nwipe_log( NWIPE_LOG_WARNING, "Buffer flush failure on '%s'.", c->device_name );
c->fsyncdata_errors++;
}
/* Reseed the PRNG. */
c->prng->init( &c->prng_state, &c->prng_seed );
while( z > 0 )
{
if( c->device_stat.st_blksize <= z )
{
blocksize = c->device_stat.st_blksize;
}
else
{
/* This is a seatbelt for buggy drivers and programming errors because */
/* the device size should always be an even multiple of its blocksize. */
blocksize = z;
nwipe_log( NWIPE_LOG_WARNING,
"%s: The size of '%s' is not a multiple of its block size %i.",
__FUNCTION__,
c->device_name,
c->device_stat.st_blksize );
}
/* Fill the output buffer with the random pattern. */
c->prng->read( &c->prng_state, d, blocksize );
/* Read the buffer in from the device. */
r = read( c->device_fd, b, blocksize );
/* Check the result. */
if( r < 0 )
{
nwipe_perror( errno, __FUNCTION__, "read" );
nwipe_log( NWIPE_LOG_ERROR, "Unable to read from '%s'.", c->device_name );
return -1;
}
/* Check for a partial read. */
if( r != blocksize )
{
/* TODO: Handle a partial read. */
/* The number of bytes that were not read. */
int s = blocksize - r;
nwipe_log(
NWIPE_LOG_WARNING, "%s: Partial read from '%s', %i bytes short.", __FUNCTION__, c->device_name, s );
/* Increment the error count. */
c->verify_errors += 1;
/* Bump the file pointer to the next block. */
offset = lseek( c->device_fd, s, SEEK_CUR );
if( offset == (off64_t) -1 )
{
nwipe_perror( errno, __FUNCTION__, "lseek" );
nwipe_log(
NWIPE_LOG_ERROR, "Unable to bump the '%s' file offset after a partial read.", c->device_name );
return -1;
}
} /* partial read */
/* Compare buffer contents. */
if( memcmp( b, d, blocksize ) != 0 )
{
c->verify_errors += 1;
}
/* Decrement the bytes remaining in this pass. */
z -= r;
/* Increment the total progress counters. */
c->pass_done += r;
c->round_done += r;
pthread_testcancel();
} /* while bytes remaining */
/* Release the buffers. */
free( b );
free( d );
/* We're done. */
return 0;
} /* nwipe_random_verify */
@ggruber would that be ok with you?
sure. you could test on the box you can access directly. Or I could perform tests on the other system with 53 disks with JBOD expansion box.
If you retain the existing method of storing the seed, seed length etc in the existing context then verification should start working without needing to change any of the verification code.
Ahhhh understood. I think that's the issue we have here. I'm using the default seeding method, but i have to extend the seed to a given length of 256 bits. So i can't store it in the provided type.
void aes_ctr_prng_init( aes_ctr_state_t* state, unsigned long init_key[], unsigned long key_length )
{
unsigned char temp_key[32]; // Temporary storage for the random key
unsigned char key[32]; // Size for 256 bits
// Generate a strong, random key using RAND_bytes
if( RAND_bytes( temp_key, sizeof( temp_key ) ) != 1 )
{
// Error handling: RAND_bytes failed to generate a key
// Insert abort or error handling here
}
// Use SHA256 to generate the final key from the random key and init_key
SHA256_CTX sha256;
SHA256_Init( &sha256 );
SHA256_Update( &sha256, temp_key, sizeof( temp_key ) ); // Add the random key
SHA256_Update( &sha256, (unsigned char*) init_key, key_length * sizeof( unsigned long ) ); // Add the init_key
SHA256_Final( key, &sha256 ); // Generate the final key
AES_set_encrypt_key( key, 256, &state->aes_key ); // Use the 256-bit key
memset( state->ivec, 0, AES_BLOCK_SIZE ); // Initialize the IV with zeros (or use RAND_bytes for the IV)
state->num = 0;
memset( state->ecount, 0, AES_BLOCK_SIZE );
}
and in prng.c
/* EXPERIMENTAL implementation of AES-128 in counter mode to provide high-quality random numbers */
int nwipe_aes_ctr_prng_init( NWIPE_PRNG_INIT_SIGNATURE )
{
nwipe_log( NWIPE_LOG_NOTICE, "Initialising AES CTR PRNG" );
if( *state == NULL )
{
/* This is the first time that we have been called. */
*state = malloc( sizeof( aes_ctr_state_t ) );
}
aes_ctr_prng_init(
(aes_ctr_state_t*) *state, (unsigned long*) ( seed->s ), seed->length / sizeof( unsigned long ) );
return 0;
}
Especially this line.
aes_ctr_prng_init((aes_ctr_state_t*) *state, (unsigned long*) ( seed->s ), seed->length / sizeof( unsigned long ) );
I think this will be a major challenge, somehow saving the complete 'state' after init. Or i could remove the RAND_bytes part from OpenSSL to improve the strength of the key, but it would lower the entropy. But this way the same 'seed' would lead to the same key being generated, retaining the default structure of the verification process. I guess here i'd need some help.
EDIT: I'm trying a variant where the same seed/init_key , always lead to the same encryption key, + adding a salt. Let's see how it works.
Now it works, i will remove the debug printf
.
Evaluation Report on the AES-CTR Pseudorandom Number Generator (PRNG) Compliance with NIST Standards
Executive Summary: This report presents the findings from a comprehensive evaluation of the Advanced Encryption Standard Counter Mode (AES-CTR) Pseudorandom Number Generator (PRNG) to determine its compliance with the National Institute of Standards and Technology (NIST) standards for randomness. The AES-CTR PRNG underwent a series of statistical tests as outlined in the NIST Special Publication 800-22 suite, aimed at assessing its effectiveness in generating cryptographically secure random sequences.
Test Methodology: The AES-CTR PRNG was subjected to the full suite of statistical tests specified in NIST SP 800-22, which includes tests for frequency, block frequency, runs, longest run of ones, binary matrix rank, discrete Fourier transform, non-overlapping template matching, and others. A total of 188 individual tests were conducted, focusing on two key aspects: the proportion of sequences that pass each statistical test (proportion analysis) and the distribution of p-values to check for uniformity (uniformity analysis).
Results: The evaluation yielded the following significant results:
Out of 188 tests conducted, 187 successfully passed both the proportion analysis and the uniformity analysis. This indicates a very high level of compliance with the expected randomness criteria set forth by NIST.
A singular test within the "Non-overlapping Template Matching" category did not pass the proportion analysis, marking a minor deviation from the expected outcomes.
The observed failure rate is well within the acceptable thresholds defined by NIST, considering the extensive number of tests performed.
Analysis: The AES-CTR PRNG demonstrated exceptional performance across nearly all statistical tests in the NIST SP 800-22 suite, showcasing its capability to produce sequences with a high degree of randomness. The nearly unanimous pass rate underscores the PRNG's robustness and its suitability for cryptographic applications requiring secure random number generation. The singular instance of non-compliance in the "Non-overlapping Template Matching" test is deemed statistically insignificant against the backdrop of the otherwise flawless performance and does not detract from the overall efficacy of the AES-CTR PRNG.
SP800-22rev1a-improved.pdf Based on the comprehensive testing and analysis conducted, it is concluded that the AES-CTR Pseudorandom Number Generator exhibits a high level of compliance with NIST standards for randomness. The results affirm the AES-CTR PRNG's appropriateness for use in cryptographic applications, reinforcing its reliability and security. The testing methodology and outcomes provide stakeholders with the assurance that the AES-CTR PRNG meets the rigorous requirements set by NIST, making it a trustworthy component in the design and implementation of secure systems.
This evaluation underscores the commitment to maintaining high standards of cryptographic security and contributes to the ongoing effort to enhance the trustworthiness of cryptographic modules used in information technology systems.
Test Results:
A total of 188 tests (some of the 15 tests actually consist of multiple sub-tests)
were conducted to evaluate the randomness of 32 bitstreams of 1048576 bits from:
/dev/loop0
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The numerous empirical results of these tests were then interpreted with
an examination of the proportion of sequences that pass a statistical test
(proportion analysis) and the distribution of p-values to check for uniformity
(uniformity analysis). The results were the following:
187/188 tests passed successfully both the analyses.
1/188 tests did not pass successfully both the analyses.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Here are the results of the single tests:
- The "Frequency" test passed both the analyses.
- The "Block Frequency" test passed both the analyses.
- The "Cumulative Sums" (forward) test passed both the analyses.
The "Cumulative Sums" (backward) test passed both the analyses.
- The "Runs" test passed both the analyses.
- The "Longest Run of Ones" test passed both the analyses.
- The "Binary Matrix Rank" test passed both the analyses.
- The "Discrete Fourier Transform" test passed both the analyses.
- 147/148 of the "Non-overlapping Template Matching" tests passed both the analyses.
1/148 of the "Non-overlapping Template Matching" tests FAILED the proportion analysis.
- The "Overlapping Template Matching" test passed both the analyses.
- The "Maurer's Universal Statistical" test passed both the analyses.
- The "Approximate Entropy" test passed both the analyses.
- 8/8 of the "Random Excursions" tests passed both the analyses.
- 18/18 of the "Random Excursions Variant" tests passed both the analyses.
- The "Serial" (first) test passed both the analyses.
The "Serial" (second) test passed both the analyses.
- The "Linear Complexity" test passed both the analyses.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The missing tests (if any) were whether disabled manually by the user or disabled
at run time due to input size requirements not satisfied by this run.
So regarding the quality of the generated numbers, AES-CTR-128 seems to be far superior at least in this point in comparison to ISAAC and Mersenne.
I've already implemented now a Xoroshiro256++ PRNG implementation, which performs at around 1.8GB/s, delivering medium-quality RNs. And similar quality from a statistical pov to AES-CTR. What's the proper way to push such changes? Create a seperate branch? Didn't do things this way yet with git. I don't want to mess up the current PR.
Yes, I would probably create a branch from the existing work you have done.
# create the branch
git checkout -b xoroshiro256
# check you are on the xoroshro256 branch
git branch
Make the changes to the code then..
git push origin xoroshiro256
Check your fork on GitHub that xoroshiro256 is there and that the
CI workflows complete without error.
Then create a pull request from GitHub on that branch.
Once I've merged the branch ..
On your PC
# switch back to the master branch
git checkout master
# update your local master on your PC
git pull upstream master
# update your nwipe fork on GitHub
git push
# Only do the next bit once you are sure your branch has been
merged by me into the master nwipe branch.
# makesure you are on your master branch
git branch
# if not on the master branch then switch to master
git checkout master
# Delete your local xoroshiro256 branch on your PC
git branch -D xoroshiro256
Delete the xoroshiro256 branch on your GitHub fork online.
Hopefully there won't be any conflicts as you are working on code in the same area but I'll deal with that when it comes to merging.
@Knogle I added a few extra commands in the comment above for updating your local master copy and deleting a branch once I have successfully merged into the nwipe master code.
@Knogle I'm currently running a 22 disc wipe consisting of drives a various sizes, using AES-CTR, method=prng, no blanking, last pass verification and the first small drive that reached 50% and is now doing verification is failing verification on all blocks, I'll know more in the morning once all the other 21 drives are doing or have completed verification.
Are you able to perform this same test with multiple drives? I have my suspicions that it may because we are not using openssl's thread locking mechanism. However, I'll know more in the morning.
@Knogle Looks like we have a problem wiping multiple drives, 4 drives are now failing verification on every block, I expect by morning to see all 22 drives have failed.
You may want to try creating say 10 loop drives, make them a bit larger, say 100MB each and see if you can reproduce the verification errors, Don't forget to switch of blanking pass. If it is thread locking then a single drive wiping & verification should work fine but as you start wiping multiple drives you will get verification errors on every block.
Thanks a lot! Let me know if you find something out. Didn't know that seperate threads are being used. Though the same data is being distributed on all drives. I will give it a try using loop devices.
Do you think you could try out the Xoroshiro variant on this machine you have after it's done?
I have run a test right now, it looks like that for AES-CTR, no issues for me. But the performance is very, but very disappointing when wiping 2 drives.
I will be out for vacation until tuesday, so i won't make changes to the code those days, but maybe you can check if you find something! Meanwhile, tbh, i think there is more hope for XORoshiro to get merged earlier huh. I think this will require a little of work to get this working.
Just to confirm, all 22 drives are failing on verification.
Regarding threads, each drive being wiped is operating in it's own thread. So external calls to libraries or functions within nwipe need to be thread safe or managed with a thread locking mechanism. In the case of nwipe_log because this can be called from the multiple different threads we use pthread_mutex_lock to avoid different threads writing to nwipe_log at the same time.
Looking at the openssl documentation they appear to have their own functions that need to be called to lock the library while it's being used by another thread.
I think your test on two drives may possibly not be enough to cause a collision, maybe ten or 20 loop drives will trigger a failure.
@Knogle / @PartialVolume How does the currently used ISAAC-64 implementation pass the NIST test? Can you post the output here for comparison.
Also seeing verification failure on 2 disk (1TB & 1.2TB) wipes.
In this pull request, I present my implementation of a pseudo-random number generator (PRNG) utilizing the AES-CTR (Advanced Encryption Standard - Counter mode) in 128-bit mode. This implementation is designed to produce high-quality random numbers, which are essential for a wide range of cryptographic applications. By integrating with the OpenSSL library and exploiting AES-NI (Advanced Encryption Standard New Instructions) hardware acceleration when available, I ensure both the security and efficiency of the random number generation process.
Key Features:
Implementation Details:
This PRNG implementation stands as a robust and efficient tool for generating high-quality pseudo-random numbers, crucial for cryptographic operations, secure communications, and randomized algorithms. The combination of AES-CTR mode, OpenSSL's reliability, and the performance benefits of AES-NI hardware acceleration results in a superior random number generator.
I have ensured that the implementation is well-documented with clear comments, making it accessible for review, understanding, and maintenance, following best practices in both software development and cryptographic standards.
I look forward to receiving feedback on this pull request to further improve and ensure the effectiveness of the PRNG implementation.
Test of randomness: