haskell-crypto / cryptonite

lowlevel set of cryptographic primitives for haskell
Other
226 stars 139 forks source link

On AVX2 systems testsuite fails on unaligned stores in cbits/decaf/include/constant_time.h:159 #347

Closed trofi closed 1 year ago

trofi commented 3 years ago

Noticed unaligned stores in cryptonite-0.29 testsuite:

./setup configure --ghc --prefix=/usr --with-compiler=/usr/bin/ghc --with-hc-pkg=/usr/bin/ghc-pkg --prefix=/usr --libdir=/usr/lib64 --libsubdir=cryptonite-0.29/ghc-8.10.4 --datadir=/usr/share/ --datasubdir=cryptonite-0.29/ghc-8.10.4 --enable-tests --ghc-options=-j16 +RTS -A256M -qb0 -RTS --with-ar=x86_64-pc-linux-gnu-ar --ghc-option=-optc-march=znver3 --ghc-option=-optc-mshstk --ghc-option=-optc-O2 --ghc-option=-optc-pipe --ghc-option=-optc-fdiagnostics-show-option --ghc-option=-optc-frecord-gcc-switches --ghc-option=-optc-Wall --ghc-option=-optc-Wextra --ghc-option=-optc-Wstack-protector --ghc-option=-optc-frecord-gcc-switches --ghc-option=-optc-frecord-gcc-switches --ghc-option=-optc-ggdb3 --ghc-option=-optl-Wl,-O1 --ghc-option=-optl-Wl,--as-needed --ghc-option=-optl-Wl,--hash-style=gnu --ghc-option=-optl-Wl,--defsym=__gentoo_check_ldflags__=0 --disable-executable-stripping --docdir=/usr/share/doc/cryptonite-0.29 --verbose --enable-shared --enable-executable-dynamic --sysconfdir=/etc --disable-library-stripping --flag=-check_alignment --flags=integer-gmp --flag=-old_toolchain_inliner --flags=support_aesni --flag=support_deepseq --flags=support_pclmuldq --flags=support_rdrand --flags=support_sse --flags=use_target_attributes
...
$ LD_LIBRARY_PATH=${PWD}/dist/built gdb --args dist/build/test-cryptonite/test-cryptonite
...
  Curve448
    KATs
      0:                                     OK
      1:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff653b504 in constant_time_lookup (idx=<optimized out>, n_table=16, elem_bytes=192, table_=0x7ffff6571aa0 <cryptonite_decaf_448_precomputed_base_as_fe>, out_=0x7fffffff9250)
    at cbits/decaf/include/constant_time.h:159
159                     *(big_register_t *)(out+k) |= br_mask & *(const big_register_t*)(&table[k+j*elem_bytes]);

Looking at the details it's the store path that is unaligned (code seems to check only table load part):

(gdb) list constant_time_lookup
...
file: "cbits/decaf/include/constant_time.h", line number: 150, symbol: "constant_time_lookup"
145         unsigned char *out = (unsigned char *)out_;
146         const unsigned char *table = (const unsigned char *)table_;
147         word_t j,k;
148
149         memset(out, 0, elem_bytes);
150         for (j=0; j<n_table; j++, big_i-=big_one) {
151             big_register_t br_mask = br_is_zero(big_i);
152             for (k=0; k<=elem_bytes-sizeof(big_register_t); k+=sizeof(big_register_t)) {
153                 if (elem_bytes % sizeof(big_register_t)) {
154                     /* unaligned */
155                     ((unaligned_br_t *)(out+k))->unaligned
156                             |= br_mask & ((const unaligned_br_t*)(&table[k+j*elem_bytes]))->unaligned;
157                 } else {
158                     /* aligned */
159                     *(big_register_t *)(out+k) |= br_mask & *(const big_register_t*)(&table[k+j*elem_bytes]);
160                 }
161             }

(gdb) disassemble
Dump of assembler code for function cryptonite_decaf_448_precomputed_scalarmul:
...
   0x00007ffff653b4f0 <+352>:   vmovdqa (%rax),%ymm0
   0x00007ffff653b4f4 <+356>:   vpand  (%rdx),%ymm2,%ymm1
   0x00007ffff653b4f8 <+360>:   add    $0x20,%rax
   0x00007ffff653b4fc <+364>:   add    $0x20,%rdx
   0x00007ffff653b500 <+368>:   vpor   %ymm0,%ymm1,%ymm0
=> 0x00007ffff653b504 <+372>:   vmovdqa %ymm0,-0x20(%rax)
   0x00007ffff653b509 <+377>:   cmp    %rax,%r14
   0x00007ffff653b50c <+380>:   jne    0x7ffff653b4f0 <cryptonite_decaf_448_precomputed_scalarmul+352>
   0x00007ffff653b50e <+382>:   add    $0xc0,%rsi

(gdb) print (void*)$rax
$1 = (void *) 0x7fffffff9270
(gdb) print (void*)$rdx
$2 = (void *) 0x7ffff6571ac0 <cryptonite_decaf_448_precomputed_base_as_fe+32>
(gdb) print sizeof(big_register_t)
$4 = 32
trofi commented 3 years ago

The following hack makes tests pass on avx2 system:

--- a/cbits/decaf/include/constant_time.h
+++ b/cbits/decaf/include/constant_time.h
@@ -150,7 +150,7 @@ constant_time_lookup (
     for (j=0; j<n_table; j++, big_i-=big_one) {
         big_register_t br_mask = br_is_zero(big_i);
         for (k=0; k<=elem_bytes-sizeof(big_register_t); k+=sizeof(big_register_t)) {
-            if (elem_bytes % sizeof(big_register_t)) {
+            if (1) {
                 /* unaligned */
                 ((unaligned_br_t *)(out+k))->unaligned
                        |= br_mask & ((const unaligned_br_t*)(&table[k+j*elem_bytes]))->unaligned;
ocheron commented 3 years ago

Likely related to modifications I did that rely on the fact that AVX2 is not enabled. See here: https://github.com/haskell-crypto/cryptonite/blob/8698c9fd940403f39387291a377aefc4bb1f1d7d/cbits/decaf/tools/generate.sh#L25

Feel free to revisit the compatibility requirements.

trofi commented 3 years ago

Aha,

  • aligned(32) attributes used for stack alignment are replaced by aligned(16). This removes warnings on OpenBSD with GCC 4.2.1, and makes sure we get at least 16-byte alignment. 32-byte alignment is necessary only for AVX2 and arch_x86_64, which we don't have.

makes it effectively incompatible with clients who pass -m* options to C compiler to leverage ISA extensions if compiler manages to find the code patterns where those could be used.

In my case --ghc-option=-optc-march=znver3 allows gcc using AVX2 and relies on according alignment to be present.

My suggestion would be to never relax aligned(32) down to aligned(16) to preserve correctness.

SuperSandro2000 commented 1 year ago

Just stumbled upon this, too.

373