This issue is to suggest extending the constant-time testing to include more configurations.
Background: modern compilers might add regressions by 'optimizing' code meant to be constant-time and adding conditional branches. An example is a recent regression in the Kyber/ML-KEM reference code, when using clang >= v15 on x86_64 and compiler options -Os, -O1, -O2 -fno-vectorize or -O3 -fno-vectorize.
The liboqs constant-time tooling would be able to detect the above issue, but CI runs the test only on a single platform, with one compiler (gcc) and with one configuration (cmake DEBUG).
To cover such cases, the suggestion would be to run the tests with different variables (e.g., in weekly tests). For example:
Architectures: x86_64 and ARM (as suggested by @SWilson4 in the last developers call)
Generic and assembly-optimized code (as already done in CI for x86_64)
Compilers: gcc and clang using different versions (e.g., system default version and latest available version)
Optimization levels: -Os, -O0, -O1, -O2, -O3 (or instead, using the cmake default options: Debug, MINSIZEREL, RELWITHDEBINFO, RELEASE)
Additional compiler flags (e.g., -fno-vectorize)
I think it's unrealistic to be really exhaustive, but it should be document which configurations were tested (and what the limitations of the tests are).
This issue is to suggest extending the constant-time testing to include more configurations.
Background: modern compilers might add regressions by 'optimizing' code meant to be constant-time and adding conditional branches. An example is a recent regression in the Kyber/ML-KEM reference code, when using clang >= v15 on x86_64 and compiler options -Os, -O1, -O2 -fno-vectorize or -O3 -fno-vectorize.
The liboqs constant-time tooling would be able to detect the above issue, but CI runs the test only on a single platform, with one compiler (gcc) and with one configuration (cmake DEBUG).
To cover such cases, the suggestion would be to run the tests with different variables (e.g., in weekly tests). For example:
I think it's unrealistic to be really exhaustive, but it should be document which configurations were tested (and what the limitations of the tests are).