libtom / libtomcrypt

LibTomCrypt is a fairly comprehensive, modular and portable cryptographic toolkit that provides developers with a vast array of well known published block ciphers, one-way hash functions, chaining modes, pseudo-random number generators, public key cryptography and a plethora of other routines.
https://www.libtom.net
Other
1.51k stars 449 forks source link

Improve SSE4.1/AES-NI support #644

Open tbvdm opened 3 months ago

tbvdm commented 3 months ago

The CryptX Perl module contains a vendored copy of libtomcrypt. It uses -msse4.1 -maes to enable AES-NI support. The problem is that these flags are used for all source files. This results in SIGILL crashes on CPUs without SSE4.1.

I think the solution is for CryptX to use -msse4.1 -maes only for aesni.c. But this is not possible without several changes to libtomcrypt. This pull request is a proposal for those changes.

The primary change is in how SSE4.1 is detected. Currently libtomcrypt checks __SSE4_1__. But this requires that -msse4.1 is used for multiple source files. Moreover, it's a compile-time check, not a run-time one. I think it would be better to detect SSE4.1 with cpuid.

Then there is the build procedure. I came up with the following approach. By default, AES-NI support is disabled. To enable it, add -DLTC_AES_NI to CFLAGS and set the required compiler flags in CFLAGS_AES_NI (typically -maes -msse4.1). For example:

make -f makefile.unix CFLAGS=-DLTC_AES_NI CFLAGS_AES_NI="-maes -msse4.1"

Checklist

tbvdm commented 3 months ago

Maybe I should cc @karel-m

sjaeckel commented 3 months ago

Did you try the proposed change of #641 and whether this maybe fixes it already?

tbvdm commented 3 months ago

641 doesn't help, unfortunately. Here is a crash from the fix-perl-cryptx-99 branch on a machine without SSE4.1 (running OpenBSD):

$ git clone -b fix-perl-cryptx-99 https://github.com/libtom/libtomcrypt.git
$ cd libtomcrypt
$ make -f makefile.unix CFLAGS="-maes -msse4.1"
$ cd ..
$ cat test.c
#include <tomcrypt.h>

int main(void)
{
    register_cipher(&aes_desc);
    gcm_test();
}
$ cc -I libtomcrypt/src/headers libtomcrypt/libtomcrypt.a test.c
$ ./a.out
Illegal instruction (core dumped)
$ egdb -q a.out a.out.core
Reading symbols from a.out...
(No debugging symbols found in a.out)
[New process 186382]
Core was generated by `a.out'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000a534708a0f0 in gcm_test ()
(gdb) x/i $pc
=> 0xa534708a0f0 <gcm_test+1696>:   ptest  %xmm0,%xmm0
HaikalAzaim commented 3 weeks ago

Did you try the proposed change of https://github.com/libtom/libtomcrypt/pull/641 and whether this maybe fixes it already?

sjaeckel commented 3 weeks ago

Sorry for the long delay.

641 doesn't help, unfortunately. Here is a crash from the fix-perl-cryptx-99 branch on a machine without SSE4.1 (running OpenBSD):

$ git clone -b fix-perl-cryptx-99 https://github.com/libtom/libtomcrypt.git
$ cd libtomcrypt
$ make -f makefile.unix CFLAGS="-maes -msse4.1"
$ cd ..
$ cat test.c
#include <tomcrypt.h>

int main(void)
{
  register_cipher(&aes_desc);
  gcm_test();
}
$ cc -I libtomcrypt/src/headers libtomcrypt/libtomcrypt.a test.c
$ ./a.out
Illegal instruction (core dumped)
$ egdb -q a.out a.out.core
Reading symbols from a.out...
(No debugging symbols found in a.out)
[New process 186382]
Core was generated by `a.out'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000a534708a0f0 in gcm_test ()
(gdb) x/i $pc
=> 0xa534708a0f0 <gcm_test+1696>: ptest  %xmm0,%xmm0

FMU that's a different problem. You can also see that it fails in gcm_test() and not in AES related code.

A valid test-case for AES-NI would be to first configure&build the library as you did and then execute this:

#include <tomcrypt.h>

int main(void)
{
    aes_test();
}

Regarding what you're seeing: You're explicitly enabling SSE4.1, which allows the compiler to generate SSE-specific instructions.

WARNING: I checked the disassembly in the past and was too lazy to re-check now, so I hope the following statement is still correct :) Please correct me if I'm wrong!

The following code comes from an era when compilers were not yet that smart:

https://github.com/libtom/libtomcrypt/blob/f7e6519fae1e11ff5ff9d36c84101a673002133b/src/encauth/gcm/gcm_mult_h.c#L19-L41

By enabling -msse4.1 at compile time you did not enable that "assembly optimized path" in the source code (via the define etc.), but the compilers got so smart that they generate exactly those assembly instructions from the "default code path" (even when LTC_GCM_TABLES_SSE2 is not defined).

You should also be able to reproduce this failure with the latest release (master branch) that has no AES-NI support yet.

Maybe also try to add debug information when building the library to see where exactly it breaks? make -f makefile.unix CFLAGS="-maes -msse4.1 -g3"

Yes, these details are only slightly related to the error you see, but should show you the problem.

sjaeckel commented 3 weeks ago

And also you're right with this PR... now I understand the purpose :)

With this we have now a "proper" runtime check for AES-NI. It allows to build the rest of the library for all amd64 (also non SSE) and only requires the AES-NI related parts to be built with SSE4.1 support.

tbvdm commented 1 week ago

Sorry for the long delay.

No worries. Thanks for looking at this.

FMU that's a different problem. You can also see that it fails in gcm_test() and not in AES related code.

[...]

Regarding what you're seeing: You're explicitly enabling SSE4.1, which allows the compiler to generate SSE-specific instructions.

That's precisely the point: currently, if you want to enable AES-NI support, you need to enable SSE4.1 unconditionally for the whole source tree.

This is especially relevant when building binary packages. The same binary package should (ideally) be able to run different CPUs with different capabilities.

And also you're right with this PR... now I understand the purpose :)

With this we have now a "proper" runtime check for AES-NI. It allows to build the rest of the library for all amd64 (also non SSE) and only requires the AES-NI related parts to be built with SSE4.1 support.

Yes, exactly. :)

tbvdm commented 1 week ago

I've force-pushed a new commit that includes the change you requested.

tbvdm commented 6 days ago

Why don't we use target attributes? See the new commit. This avoids the need for flags altogether. You only have to make sure that LTC_AES_NI is defined. For example:

make -f makefile.unix CFLAGS=-DLTC_AES_NI

I defined __attribute__ away for MSVC. Also, MSVC does not need flags or attributes to enable SSE4.1 and AES intrinsics.

(However, I don't think MSVC will work anyway, because apparently it does not support inline assembly on x64. So the cpuid call will fail. I think the solution is to call cpuid through an intrinsic.)