Closed smanders closed 2 years ago
the following code breaks on our cm system
#include <sodium.h>
int main()
{
unsigned char pk[crypto_box_PUBLICKEYBYTES];
unsigned char sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(pk, sk);
return 0;
}
smanders@bpbld21-13:/bpvol/nabreak/_bld (master)
$ ./nabreak
Illegal instruction (core dumped)
smanders@bpbld21-13:/bpvol/nabreak/_bld (master)
$ cat /proc/cpuinfo
...
model name : Intel(R) Xeon(R) Gold 6238R CPU @ 2.20GHz
...
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx
fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable
nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp
fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid rdseed adx smap xsaveopt arat md_clear spec_ctrl
intel_stibp flush_l1d arch_capabilities
...
https://isrhub.usurf.usu.edu/Vantage/VantageSuper/pull/5199 from scottjr
It appears the assembly instruction in question belongs to a CPUID feature flag called avx512vl or avx512f according to this doc: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-2b-manual.pdf
The /proc/cpuinfo from the vanbld-lin7-02 does not include those in its flags list, while my own system (running the WSL2 Docker container) does include the avx512vl flag. So there's definitely a difference between the processors, but it sounds like libsodium's configure script isn't able to detect that difference or wasn't coded to account for that difference... ... I've also confirmed in the debugger on vanbld-lin7-02 that runtime.c correctly sets these cpu_features flags: has_avx = 1 has_avx2 = 1 has_avx512f = 0 So I believe the issue is just that their code did not respect the has_avx512f setting in choosing which code to run, where we ran into the SIGILL. ... In walking thru the ge25519_scalarmult_base() code, none of its code nor the code it invokes appears to be using any assembly code, unless they're doing something weird with macros.
This is my current theory of what's going on:
Their configure script detects that our version of gcc supports the -mavx512f flag, which it does (even on systems with processors that don't support avx512f). This results in them adding a define to enable avx512f use.
There is some libsodium code that does run-time decisions based on the processor's avx512f support (such as libsodium/crypto_pwhash/argon2/argon2-core.c), but the scalarmult code we're looking at does not.
I believe libsodium is adding to the CFLAGS for its builds this option: "-mavx512f", telling the compiler that we support building with AVX512F-based instructions.
In Release builds, the gcc compiler does some optimizations of the libsodium code, and makes use of the AVX512F instruction set (without any runtime checks). According to the GCC compiler docs:
Applications that perform run-time CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options.
So then when the libsodium code is run on a system with avx512f support in its processors, everything is fine, but when run on a system without, then we run into this SIGILL.
If this is correct, I believe we could fix the issue by manually turning off the avx512f support at libsodium configure/cmake time.
similar issue with vcpkg libsodium build https://github.com/microsoft/vcpkg/issues/8956
their fix https://github.com/microsoft/vcpkg/pull/8974/files - use check_c_source_runs()
instead of check_c_source_compiles()
HEAD of master branch (2021.05.24 commit as of this writing) https://github.com/microsoft/vcpkg/blob/master/ports/libsodium/CMakeLists.txt
the majority of the unit tests were failing on the official externpro build system, although the code that breaks the cm system is successful on the official externpro build system... so I verified that the updates to the configure.cmake script fix the unit tests
the sodium.patch for cmake build support is now https://github.com/jedisct1/libsodium/compare/95673e5b51e750c5eee1aecd935cbfc5791d741b...smanders:xp21.06.17
updating to the HEAD of stable branch, 2021.11.18 commit, there was a new unit test error - (utils2 unit test)
diff between 2021.06.17 commit and 2021.11.18 commit https://github.com/jedisct1/libsodium/compare/95673e5b51e750c5eee1aecd935cbfc5791d741b...aa099f5e82ae78175f9c1c48372a123cb634dd92
the sodium.patch for cmake build support is now https://github.com/jedisct1/libsodium/compare/aa099f5e82ae78175f9c1c48372a123cb634dd92...smanders:xp21.11.18
completed with commits referenced above, verified that unit tests all pass and that code runs on cm system https://github.com/smanders/externpro/issues/330#issuecomment-997277102
libsodium was added summer 2021 https://github.com/smanders/externpro/issues/317
we currently build what was the head of the stable branch https://github.com/smanders/externpro/blob/21.05/projects/sodium.cmake#L3-L4
the last release was 1.0.18 on 2019.05.30 https://github.com/jedisct1/libsodium/tags
it appears that using the head of the stable branch is what we will want