Closed vt-alt closed 2 years ago
The key line in http://git.altlinux.org/beehive/logs/Sisyphus/x86_64/archive/2022/0702/error/liboqs-0.7.1-alt2 is:
. > /usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/test_kem_mem HQC-128 2 ERROR: OQS_KEM_decaps failed
And it is very surprising: It seems the final (decaps) operation of this single algorithm (HQC-128) is failing. If nothing in the code base was changed (?) this should not happen. Can you manually reproduce this error by running these commands (and sharing your output):
/usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/test_kem_mem HQC-128 0
/usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/test_kem_mem HQC-128 1
/usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/test_kem_mem HQC-128 2
If this also fails, please also run git status
in the main project directory (/usr/src/RPM/BUILD/liboqs-0.7.1
in your case) as well as uname -a
, cat /etc/os-release
and share the output of these commands, too: Is this ALT Linux?
Finally, does the current liboqs
main branch fail in the same way as the release (0.7.1) you're running?
Yes this is ALT Linux and it's daily rebuild of the package which previously built without errors. Package is build from the tag 0.7.1
.
builder@x86_64:~/RPM/BUILD/liboqs-0.7.1$ /usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/test_kem_mem HQC-128 0
Configuration info
==================
Target platform: x86_64-Linux-5.15.6-un-def-alt1
Compiler: gcc (12.1.1)
Compile options: [-Werror;-Wall;-Wextra;-Wpedantic;-Wstrict-prototypes;-Wshadow;-Wformat=2;-Wfloat-equal;-Wwrite-strings;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version: 0.7.1
Git commit: unknown
OpenSSL enabled: Yes (OpenSSL 1.1.1p 21 Jun 2022)
AES: OpenSSL
SHA-2: OpenSSL
SHA-3: C
OQS build flags: BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=RelWithDebInfo
CPU exts active: AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
================================================================================
Executing keygen for KEM HQC-128
================================================================================
builder@x86_64:~/RPM/BUILD/liboqs-0.7.1$
builder@x86_64:~/RPM/BUILD/liboqs-0.7.1$ echo $?
0
builder@x86_64:~/RPM/BUILD/liboqs-0.7.1$ /usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/test_kem_mem HQC-128 1
Configuration info
==================
Target platform: x86_64-Linux-5.15.6-un-def-alt1
Compiler: gcc (12.1.1)
Compile options: [-Werror;-Wall;-Wextra;-Wpedantic;-Wstrict-prototypes;-Wshadow;-Wformat=2;-Wfloat-equal;-Wwrite-strings;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version: 0.7.1
Git commit: unknown
OpenSSL enabled: Yes (OpenSSL 1.1.1p 21 Jun 2022)
AES: OpenSSL
SHA-2: OpenSSL
SHA-3: C
OQS build flags: BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=RelWithDebInfo
CPU exts active: AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
================================================================================
Executing encaps for KEM HQC-128
================================================================================
builder@x86_64:~/RPM/BUILD/liboqs-0.7.1$ echo $?
0
builder@x86_64:~/RPM/BUILD/liboqs-0.7.1$ /usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/test_kem_mem HQC-128 2
Configuration info
==================
Target platform: x86_64-Linux-5.15.6-un-def-alt1
Compiler: gcc (12.1.1)
Compile options: [-Werror;-Wall;-Wextra;-Wpedantic;-Wstrict-prototypes;-Wshadow;-Wformat=2;-Wfloat-equal;-Wwrite-strings;-Wstrict-overflow;-ggdb3;-Wbad-function-cast]
OQS version: 0.7.1
Git commit: unknown
OpenSSL enabled: Yes (OpenSSL 1.1.1p 21 Jun 2022)
AES: OpenSSL
SHA-2: OpenSSL
SHA-3: C
OQS build flags: BUILD_SHARED_LIBS OQS_DIST_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=RelWithDebInfo
CPU exts active: AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
================================================================================
Executing decaps for KEM HQC-128
================================================================================
ERROR: OQS_KEM_decaps failed
builder@x86_64:~/RPM/BUILD/liboqs-0.7.1$ echo $?
1
This is not raw upstream repository, but a repo with RPM spec which builds this package, but you can see it's merged from 0.7.1
.
altair:~/src/liboqs (sisyphus)$ git status
On branch sisyphus
Your branch is up to date with 'gears/sisyphus'.
nothing to commit, working tree clean
altair:~/src/liboqs (sisyphus)$ git l -11
> b368a179 2022-03-26 0.7.1-alt2 (Vitaly Chikunov) (HEAD -> sisyphus, tag: gb-sisyphus-task297315.100, tag: 0.7.1-alt2, gitery/sisyphus, gears/sisyphus)
> b59a92a1 2022-03-22 skip yamllint test for good (#1196) (Michael Baentsch)
> e4c4a7ea 2021-12-20 0.7.1-alt1 (Vitaly Chikunov) (tag: gb-sisyphus-task292326.200, tag: 0.7.1-alt1)
> 60abd6e5 2021-12-20 Merge tag '0.7.1' into sisyphus (Vitaly Chikunov)
> a39d08e0 2021-12-16 liboqs 0.7.1 (Douglas Stebila) (tag: 0.7.1)
> ed190848 2021-12-15 add trigger for oqs-provider CI (#1157) (Michael Baentsch)
> 7b549971 2021-12-14 docs: set license MIT for Falcon Signature (#1156) (ax1)
> ae6c7b47 2021-12-13 Output documented build options (#1155) (Michael Baentsch)
> abf1080d 2021-12-12 Added spdx headers to noregress.[py|sh] (#1154) (Jason Goertzen)
> ca0cd60a 2021-12-11 release performance regression test (#1152) (Michael Baentsch)
> 0a61d5d0 2021-12-09 0.7.1-rc1 [skip ci] (Douglas Stebila) (tag: 0.7.1-rc1)
altair:~/src/liboqs (sisyphus)$ git diff --stat 0.7.1
.gear/liboqs.spec | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
.gear/rules | 2 ++
src/CMakeLists.txt | 2 +-
tests/test_code_conventions.py | 11 -----------
4 files changed, 133 insertions(+), 12 deletions(-)
altair:~/src/liboqs (sisyphus)$ git diff 0.7.1 src/CMakeLists.txt
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index a68c5619..01cf1e73 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -84,7 +84,7 @@ set_target_properties(oqs
ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib"
LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/lib"
VERSION ${OQS_VERSION_TEXT}
- SOVERSION 0
+ SOVERSION 1
# For Windows DLLs
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin")
$ cat /etc/os-release
NAME=Sisyphus
VERSION=20201124
ID=altlinux
VERSION_ID=20201124
PRETTY_NAME="ALT Sisyphus Sisyphus"
ANSI_COLOR="1;33"
CPE_NAME="cpe:/o:alt:sisyphus:20201124"
HOME_URL="https://www.altlinux.org"
BUG_REPORT_URL="https://bugzilla.altlinux.org"
builder@x86_64:/.in$ uname -a
Linux localhost.localdomain 5.15.6-un-def-alt1 #1 SMP PREEMPT Wed Dec 1 13:22:58 UTC 2021 x86_64 GNU/Linux
Finally, does the current liboqs main branch fail in the same way as the release (0.7.1) you're running?
Will report that a bit later.
Ok merged origin/main
and build & run tests (failures are reproduced):
altair:~/src/liboqs (sisyphus)$ git fetch origin
altair:~/src/liboqs (sisyphus)$ git merge --no-edit origin/main
altair:~/src/liboqs (sisyphus)$ git l -1
> 26a5d878 2022-07-03 Merge remote-tracking branch 'origin/main' into sisyphus (Vitaly Chikunov) (HEAD -> sisyphus)
altair:~/src/liboqs (sisyphus)$ git describe origin/main --tags
0.7.1-44-g1a1e9184
altair:~/src/liboqs (sisyphus)$ git l -2 origin/main
> 1a1e9184 2022-07-01 disable msys2 testing (#1243) (Michael Baentsch) (origin/main, origin/HEAD)
> a8dad8d3 2022-07-01 adding memory leak testing (#1234) (Michael Baentsch)
altair:~/src/liboqs (sisyphus)$ gear-hsh --commit 2>&1 | tee log2 # this is our builder tool run in clean env
...
+ /usr/bin/ninja -j32 -C build run_tests
...
=========================== short test summary info ============================
FAILED tests/test_cmdline.py::test_kem[HQC-192] - AssertionError: Got unexpec...
FAILED tests/test_cmdline.py::test_kem[HQC-128] - AssertionError: Got unexpec...
FAILED tests/test_cmdline.py::test_kem[HQC-256] - AssertionError: Got unexpec...
FAILED tests/test_kat.py::test_kem[HQC-128] - AssertionError: Got unexpected ...
FAILED tests/test_kat.py::test_kem[HQC-192] - AssertionError: Got unexpected ...
FAILED tests/test_kat.py::test_kem[HQC-256] - AssertionError: Got unexpected ...
FAILED tests/test_mem.py::test_mem_kem[HQC-256] - AssertionError: Got unexpec...
FAILED tests/test_mem.py::test_mem_kem[HQC-128] - AssertionError: Got unexpec...
FAILED tests/test_mem.py::test_mem_kem[HQC-192] - AssertionError: Got unexpec...
================= 9 failed, 629 passed, 383 skipped in 33.99s ==================
FAILED: tests/CMakeFiles/run_tests /usr/src/RPM/BUILD/liboqs-0.7.1/build/tests/CMakeFiles/run_tests
cd /usr/src/RPM/BUILD/liboqs-0.7.1 && /usr/bin/cmake -E env OQS_BUILD_DIR=/usr/src/RPM/BUILD/liboqs-0.7.1/build python3 -m pytest --verbose --numprocesses=auto --ignore=scripts/copy_from_upstream/repos
ninja: build stopped: subcommand failed.
error: Bad exit status from /usr/src/tmp/rpm-tmp.28155 (%check)
Rerun of three test_kem_mem
in origin/main
tree produces the same result as before — first 2 runs are ok and third fails with ERROR: OQS_KEM_decaps failed
.
Quick gdb
ing it seems that /usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/kem.c
PQCLEAN_HQCRMRS128_AVX2_crypto_kem_dec
producing the error. grep
ing log I don't find any error or build warning regarding that.
$ grep pqclean_hqc-rmrs-128_avx2/kem log2
Thanks for these logs and reconfirmations. So in summary: you do a daily build with 0.7.1 (unchanged for quite some time) and suddenly HQC started to fail?! The most prominent thing that changed recently was the OpenSSL update. Would HQC work OK if you built liboqs
without it ("cmake -DOQS_USE_OPENSSL=OFF")?
I just checked the logs, and we started getting re-build failures on June 2, the day we switched to gcc-12 as default compiler. I just tried to rebuild the package (with main
still merged) using CC=gcc-11
and there is no test failures: 638 passed, 383 skipped in 28.37s
$ grep pqclean_hqc-rmrs-128_avx2/kem log3
[553/2363] /usr/bin/gcc-11 -I/usr/src/RPM/BUILD/liboqs-0.7.1/build/include -I/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2 -I/usr/src/RPM/BUILD/liboqs-0.7.1/src/common/pqclean_shims -pipe -frecord-gcc-switches -Wall -fdiagnostics-color=always -fdiagnostics-color=always -fdiagnostics-color=always -g -O2 -flto=auto -Wa,--noexecstack -O2 -g -DNDEBUG -fPIC -fvisibility=hidden -Werror -Wall -Wextra -Wpedantic -Wstrict-prototypes -Wshadow -Wformat=2 -Wfloat-equal -Wwrite-strings -Wstrict-overflow -ggdb3 -mavx2 -mbmi -mpclmul -Wno-missing-braces -std=gnu11 -MD -MT src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/kem.c.o -MF src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/kem.c.o.d -o src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/kem.c.o -c /usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/kem.c
Rebuild with -DOQS_USE_OPENSSL=OFF
, as you requested, on gcc-12 still produces the same test failures: 9 failed, 629 passed, 383 skipped in 28.18s
.
I thought this may be related to LTO (-flto=auto
) and turned it off. After that there is build error appeared:
[1/966] /usr/bin/cc -I/usr/src/RPM/BUILD/liboqs-0.7.1/build/include -I/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2 -I/usr/src/RPM/BUILD/liboqs-0.7.1/src/common/pqclean_shims -pipe -frecord-gcc-switches -Wall -fdiagnostics-color=always -fdiagnostics-color=always -fdiagnostics-color=always -g -O2 -Wa,--noexecstack -O2 -g -DNDEBUG -fPIC -fvisibility=hidden -Werror -Wall -Wextra -Wpedantic -Wstrict-prototypes -Wshadow -Wformat=2 -Wfloat-equal -Wwrite-strings -Wstrict-overflow -ggdb3 -mavx2 -mbmi -mpclmul -Wno-missing-braces -std=gnu11 -MD -MT src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/reed_muller.c.o -MF src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/reed_muller.c.o.d -o src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/reed_muller.c.o -c /usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c
FAILED: src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/reed_muller.c.o
/usr/bin/cc -I/usr/src/RPM/BUILD/liboqs-0.7.1/build/include -I/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2 -I/usr/src/RPM/BUILD/liboqs-0.7.1/src/common/pqclean_shims -pipe -frecord-gcc-switches -Wall -fdiagnostics-color=always -fdiagnostics-color=always -fdiagnostics-color=always -g -O2 -Wa,--noexecstack -O2 -g -DNDEBUG -fPIC -fvisibility=hidden -Werror -Wall -Wextra -Wpedantic -Wstrict-prototypes -Wshadow -Wformat=2 -Wfloat-equal -Wwrite-strings -Wstrict-overflow -ggdb3 -mavx2 -mbmi -mpclmul -Wno-missing-braces -std=gnu11 -MD -MT src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/reed_muller.c.o -MF src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/reed_muller.c.o.d -o src/kem/hqc/CMakeFiles/hqc_128_avx2.dir/pqclean_hqc-rmrs-128_avx2/reed_muller.c.o -c /usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c
In function 'find_peaks',
inlined from 'PQCLEAN_HQCRMRS128_AVX2_reed_muller_decode' at /usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c:387:18:
/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c:336:44: error: 'tmp' is used uninitialized [-Werror=uninitialized]
336 | result |= mask & ((uint16_t *)&tmp)[i];
| ~~~~~~~~~~~~~~~~~~^~~
/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c: In function 'PQCLEAN_HQCRMRS128_AVX2_reed_muller_decode':
/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c:234:13: note: 'tmp' was declared here
234 | __m256i tmp = _mm256_setzero_si256();
| ^~~
cc1: all warnings being treated as errors
ninja: build stopped: subcommand failed.
There is 3 errors like this in
/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c
/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-192_avx2/reed_muller.c
/usr/src/RPM/BUILD/liboqs-0.7.1/src/kem/hqc/pqclean_hqc-rmrs-256_avx2/reed_muller.c
This is perhaps the cause of test problems.
Thanks very much for these additional checks. Yes, it seems to be related to that file and that compiler (version).
Given there's not a lot of algorithm maintenance (fixing such bugs) going on until NIST decides which algorithm(s) are eliminated from consideration for standardization, here's my workaround suggestions until this "low maintenance mode" is lifted: 1) Disable HQC if you need to use gcc-12 (which we don't test with, i.e., -DOQS_ENABLE_KEM_HQC=OFF). Or 2) Upgrade to clang-14 (which we try to standardize our testing on)
Edit/Add: Looking at https://github.com/open-quantum-safe/liboqs/issues/995 HQC seems to be in an exceptionally "deep" "low maintenance" mode .... If I were you, I'd go for option 1, thus.
Yes let's wait July 5 for the NST announcement. I also produced my workaround:
diff --git a/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c b/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c
index 85afd331..77339b33 100644
--- a/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c
+++ b/src/kem/hqc/pqclean_hqc-rmrs-128_avx2/reed_muller.c
@@ -331,6 +331,7 @@ inline uint32_t find_peaks(__m256i *transform) {
tmp = _mm256_or_si256(tmp, _mm256_and_si256(vect_mask, transform[i]));
}
result = 0;
+#pragma GCC unroll 16
for (size_t i = 0; i < 16; i++) {
mask = ~(uint32_t) ((-(int64_t)(i ^ message % 16)) >> 63);
result |= mask & ((uint16_t *)&tmp)[i];
diff --git a/src/kem/hqc/pqclean_hqc-rmrs-192_avx2/reed_muller.c b/src/kem/hqc/pqclean_hqc-rmrs-192_avx2/reed_muller.c
index dbfd6a29..8fb506cb 100644
--- a/src/kem/hqc/pqclean_hqc-rmrs-192_avx2/reed_muller.c
+++ b/src/kem/hqc/pqclean_hqc-rmrs-192_avx2/reed_muller.c
@@ -331,6 +331,7 @@ inline uint32_t find_peaks(__m256i *transform) {
tmp = _mm256_or_si256(tmp, _mm256_and_si256(vect_mask, transform[i]));
}
result = 0;
+#pragma GCC unroll 16
for (size_t i = 0; i < 16; i++) {
mask = ~(uint32_t) ((-(int64_t)(i ^ message % 16)) >> 63);
result |= mask & ((uint16_t *)&tmp)[i];
diff --git a/src/kem/hqc/pqclean_hqc-rmrs-256_avx2/reed_muller.c b/src/kem/hqc/pqclean_hqc-rmrs-256_avx2/reed_muller.c
index 22527b8a..48c2db4c 100644
--- a/src/kem/hqc/pqclean_hqc-rmrs-256_avx2/reed_muller.c
+++ b/src/kem/hqc/pqclean_hqc-rmrs-256_avx2/reed_muller.c
@@ -331,6 +331,7 @@ inline uint32_t find_peaks(__m256i *transform) {
tmp = _mm256_or_si256(tmp, _mm256_and_si256(vect_mask, transform[i]));
}
result = 0;
+#pragma GCC unroll 16
for (size_t i = 0; i < 16; i++) {
mask = ~(uint32_t) ((-(int64_t)(i ^ message % 16)) >> 63);
result |= mask & ((uint16_t *)&tmp)[i];
This GCC error is obscure but while experimenting with an idea of replacing this array indexed access with _mm256_extract_epi16
which requires index to be immediate value I tried to achieve the same property of the index just for the casted array index i
by unrolling the loops and it's worked.
With this patch (and LTO disabled) it builds 0.7.1
without errors and the tests do not fail.
Yes let's wait July 5 for the NST announcement. I also produced my workaround:
So HQC remains "in the running" for NIST. Do you prefer to
liboqs
patches? There, a PR by the author (that would be you :) would be preferred, too.I would rather submit PR to liboqs since I'm not user of raw PQClean.
I submitted the PR.
ps. BTW, our tests run on ARM32 (armv7hf
) architecture fail due to 1-hour timeout expiring for the FALCON (so I temporary disabled it[1]). But because it's selected by NIST, perhaps, I should debug it more and submit proper bug report.
[1]
%ifarch armh
-DOQS_ENABLE_SIG_FALCON=OFF \
%endif
I submitted a bug to GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106470
With the time there is multiple test errors appeared (for 0.7.1 on x86_64):
Full build log and test details http://git.altlinux.org/beehive/logs/Sisyphus/x86_64/archive/2022/0702/error/liboqs-0.7.1-alt2
For example, excerpt for one test:
I can say this output is quite obscure and hard to understand what exactly is failed.