Closed xvzcf closed 7 months ago
@xvzcf, Thank you for reporting this! It seems that for the Cortex-M4 neither clang nor gcc emits divison instructions at any optimization level (I tested the recent versions with -O{0,1,2,3}). That also makes sense to me because the umull/smull instructions take only 1 cycle, while udiv/sdiv take up to 12 cycles.
Yet, people may take this code and run it on other cores implementing (a superset of) Armv7E-M which may behave differenty. Also other compiler versions may behave differently. So I'd say we still implement the fix to be safe. I'll get to that next week.
Interesting. We don't have a Cortex-M4 to test on, but there seem to be some compiler versions that may be worth looking at. E.g. GCC 11.2 with -Os appears to call a division subroutine on ARM32: https://godbolt.org/z/dhYMTTGf8
You need to pass -mcpu=cortex-m4
, but it indeed compiles to udiv r0, r0, r3
when using -Os
which is certainly variable time. Scary. I'll fix it right now.
The m4 implementations in crypto_kem/kyberXXX
have been fixed in #320.
However, the clean
implementations in mupq/pqclean/crypto_kem/kyberxxx
are coming from https://github.com/PQClean/PQClean. We have to wait until that gets updated upstream.
There is an issue for that in PQClean: https://github.com/PQClean/PQClean/issues/533
Keeping this issue open for now to track progress.
We have to wait until that gets updated upstream.
@mkannwischer Upstream fixed this (except for the ARM64 optimized implementation) on January 25th.
Resolved in #320 and #332
This bit of code or similar is used in your various implementations of Kyber to compress a polynomial ring element into a (secret) message:
https://github.com/mupq/pqm4/blob/dc26f540ad2d40b70c86d59e9d54eaaab2a2eeed/crypto_kem/kyber768/m4fspeed/poly.c#L550
To do so, it performs a division by Q that might not necessarily compile to a multiplication instruction: looking at the output of some C compilers using https://godbolt.org/z/sKn3TKKGq and https://godbolt.org/z/8GqKoTfYh for example, a division instruction is emitted even when -O3 is specified. Should a division instruction be emitted, its execution time would likely be variable and leak information about its secret input.
We reported a similar issue in the CRYSTALS-Kyber reference implementation; you may want to use their fix: https://github.com/pq-crystals/kyber/commit/dda29cc63af721981ee2c831cf00822e69be3220