martijnvanbrummelen / nwipe

nwipe secure disk eraser
GNU General Public License v2.0
631 stars 71 forks source link

Integrating SHA-512 HMAC DRBG and Exploring for crypto-secure Superior Random Number Generation #557

Closed Knogle closed 3 months ago

Knogle commented 3 months ago

This pull request heralds the implementation of the SHA-512 HMAC Deterministic Random Bit Generator (DRBG) for cryptographically secure random number generation within our project. By leveraging the robustness and security of the SHA-512 HMAC algorithm, we significantly elevate our capability to produce secure, unpredictable random numbers, crucial for a wide array of cryptographic functions and secure data processes.

Key Benefits of SHA-512 HMAC DRBG:

1. Superior Security: The SHA-512 HMAC DRBG offers unparalleled security by integrating the cryptographic strength of SHA-512 with HMAC, ensuring resistance against both brute force attacks and sophisticated cryptographic attacks. This enhancement is essential for applications requiring the highest degree of randomness and security, such as cryptographic key generation, secure communications, and high-security token generation.

2. Compliance with Cryptographic Standards: Adhering to established cryptographic standards, SHA-512 HMAC DRBG meets or exceeds the requirements set forth by organizations such as NIST (National Institute of Standards and Technology), providing a vetted and reliable foundation for generating cryptographically secure random numbers.

3. Robustness Against Predictability: By utilizing a HMAC-based construction, SHA-512 HMAC DRBG significantly mitigates the risk of predictability and randomness attacks, ensuring that the generated numbers are secure for all cryptographic purposes.

4. High Performance and Efficiency: Despite its cryptographic robustness, SHA-512 HMAC DRBG maintains high performance and efficiency, making it well-suited for environments where both security and speed are paramount. This balance ensures that applications can generate secure random numbers without significant impact on performance.

Integration Details:

The integration of SHA-512 HMAC DRBG into our project has been carefully planned to ensure both ease of use and high compatibility with existing systems. The pull request encompasses the core SHA-512 HMAC DRBG algorithm implementation, detailed documentation for developers, and utility functions for efficient key management and random number generation. This comprehensive approach guarantees that developers have immediate access to secure random number generation capabilities, enhancing the overall security and integrity of the project.

The adoption of SHA-512 HMAC DRBG marks a strategic enhancement to our project's cryptographic infrastructure, emphasizing our commitment to security, reliability, and adherence to the highest standards of cryptographic excellence.'s cryptographic infrastructure, emphasizing our commitment to security, reliability, and adherence to the highest standards of cryptographic excellence.

Testing:

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        ufile_CreateReadBin
 Number of statistics:  15
 Total CPU time:   00:00:06.81

 All tests were passed

a7859aca-e3d8-466c-a9a4-b2e8c432fcd3 da98f452-f811-4253-8b88-b292ba1809ee

Knogle commented 3 months ago

Regarding the performance, it's inferior in comparison to AES. Also SHA New Instructions were introduced recently, starting with 1st. Ryzen gen. AES-Ni exists for aorund 15 years. Even though it's the first crypto-secure algorithm for nwipe. (CSPRNG).

PartialVolume commented 3 months ago

@Knogle If you are thinking of adding any more prngs can you update your master first, before creating the branch otherwise it causes me a load of work resolving conflicts. Thanks :+1:

Knogle commented 3 months ago

@Knogle If you are thinking of adding any more prngs can you update your master first, before creating the branch otherwise it causes me a load of work resolving conflicts. Thanks 👍

Yeah you are right, it's a lot of work heh. Should match now after some rebasing etc. If this one is OK, i will continue troubleshooting the AES issue.

PartialVolume commented 3 months ago

@Knogle If you are thinking of adding any more prngs can you update your master first, before creating the branch otherwise it causes me a load of work resolving conflicts. Thanks 👍

Yeah you are right, it's a lot of work heh. Should match now after some rebasing etc. If this one is OK, i will continue troubleshooting the AES issue.

The xoro wipes are still running but looks like there won't be any issues. I'm busy tomorrow morning but I'll start the tests on sha-hmac tomorrow afternoon (Friday) so will merge tomorrow evening if no problems.

PartialVolume commented 3 months ago

Unfortunately, in it's current state this would be unusable for nwipe. It's incredibly slow, using any of the other prngs it would take approximately six hours to wipe the 16 drives shown below but with the sha-512 DBRG it would take over 282 hours! About 50 times longer than the other prngs. A typical drives I/O speed has dropped from 100MB/sec to 0.23MB/sec. The cores in use on this 40 core processor are maxed out with nwipe's CPU % being at 1600%

Is it possible to implement our own sha-512 DBRG without using openssl? Maybe that would be faster or is it a bug in the implementation?

sha-512-DBRG-openssl top showing 1600% CPU usage Screenshot_20240322_165051

Screenshot_20240322_165643

sha-512-DBRG-openssl_very poor speed_possibly not suitable for nwipe_Screenshot_20240322_163241

Knogle commented 3 months ago

Hey! Oh that's bad. May i ask which CPU architecture your system is using? One possible point. SHA New Instructions can only accelerate SHA-256. Maybe i can try to migrate the code to SHA-256 in order to benefit from hardware acceleration.

Do you know how to modify the ./configure.ac or Makefile.am in order to procude .S or .asm files as output for debugging? I'd like to check if SHA New Instructions is present.

Tried this without effect.

./configure --enable-asm=yes

# --enable-asm Option definieren
AC_ARG_ENABLE([asm],
[AS_HELP_STRING([--enable-asm],
[Enable generating assembler code for debugging @<:@default=no@:>@])],
[case "${enableval}" in
    yes) asm_enabled=true ;;
    no)  asm_enabled=false ;;
    *) AC_MSG_ERROR([bad value ${enableval} for --enable-asm]) ;;
esac],[asm_enabled=false])

# Bedingtes Setzen der CFLAGS
if test x"$asm_enabled" = x"yes"; then
    CFLAGS="$CFLAGS -S"
fi

EDIT:

We are not using any hardware acceleration yet which explains those bad results. Doing an OpenSSL benchmark i get around 1GB/s with SHA-256, so it's an issue with the implementation yet.

    16 Bytes: 156,47 MB/s
    64 Bytes: 486,10 MB/s
    256 Bytes: 1.194,80 MB/s
    1024 Bytes: 1.866,59 MB/s
    8192 Bytes: 2.238,20 MB/s
    16384 Bytes: 2.270,45 MB/s
PartialVolume commented 3 months ago

May i ask which CPU architecture your system is using?

40 core Xeon

processor       : 39
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 1200.000
cache size      : 25600 KB
physical id     : 1
siblings        : 20
core id         : 12
cpu cores       : 10
apicid          : 57
initial apicid  : 57
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse
36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan
t_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pc
lmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4
_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti s
sbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm
 ida arat pln pts vnmi md_clear flush_l1d
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexprio
rity apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs 
itlb_multihit mmio_unknown
bogomips        : 5600.33
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Do you know how to modify the ./configure.ac or Makefile.am in order to procude .S or .asm files as output for debugging? I'd like to check if SHA New Instructions is present.

Sorry, no.

PartialVolume commented 3 months ago

SHA-256 is twice as slow as SHA-512, now over 500hrs to completion.

https://github.com/martijnvanbrummelen/nwipe/assets/22084881/23a9a523-d74b-43b0-818d-9c6f9adc2869

Knogle commented 3 months ago

May i ask which CPU architecture your system is using?

40 core Xeon

processor       : 39
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
stepping        : 4
microcode       : 0x42e
cpu MHz         : 1200.000
cache size      : 25600 KB
physical id     : 1
siblings        : 20
core id         : 12
cpu cores       : 10
apicid          : 57
initial apicid  : 57
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse
36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan
t_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pc
lmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4
_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti s
sbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm
 ida arat pln pts vnmi md_clear flush_l1d
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexprio
rity apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs 
itlb_multihit mmio_unknown
bogomips        : 5600.33
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

Do you know how to modify the ./configure.ac or Makefile.am in order to procude .S or .asm files as output for debugging? I'd like to check if SHA New Instructions is present.

Sorry, no.

Ahhh okay. Unfortunately it doesn't have Intel SHA. What do you mean? In my opinion we can drop this for now, and retake work and troubleshooting on AES-CTR. I think it delivers great performance, when using SHA-512 without SHA-NI it's just unusable.

Knogle commented 3 months ago

Hey, i hope you are doing fine. I think you agree in closing this PR. I've found out. On systems with SHA New Instructions it does around 100MB/s per thread. Without these instructions it's completely unusable. I will focus on AES-CTR instead!