mupq / pqm4

Post-quantum crypto library for the ARM Cortex-M4
296 stars 75 forks source link

Possible variable-time division when decapsulating in Kyber #319

Closed xvzcf closed 7 months ago

xvzcf commented 11 months ago

This bit of code or similar is used in your various implementations of Kyber to compress a polynomial ring element into a (secret) message:

https://github.com/mupq/pqm4/blob/dc26f540ad2d40b70c86d59e9d54eaaab2a2eeed/crypto_kem/kyber768/m4fspeed/poly.c#L550

To do so, it performs a division by Q that might not necessarily compile to a multiplication instruction: looking at the output of some C compilers using https://godbolt.org/z/sKn3TKKGq and https://godbolt.org/z/8GqKoTfYh for example, a division instruction is emitted even when -O3 is specified. Should a division instruction be emitted, its execution time would likely be variable and leak information about its secret input.

We reported a similar issue in the CRYSTALS-Kyber reference implementation; you may want to use their fix: https://github.com/pq-crystals/kyber/commit/dda29cc63af721981ee2c831cf00822e69be3220

mkannwischer commented 11 months ago

@xvzcf, Thank you for reporting this! It seems that for the Cortex-M4 neither clang nor gcc emits divison instructions at any optimization level (I tested the recent versions with -O{0,1,2,3}). That also makes sense to me because the umull/smull instructions take only 1 cycle, while udiv/sdiv take up to 12 cycles.

Yet, people may take this code and run it on other cores implementing (a superset of) Armv7E-M which may behave differenty. Also other compiler versions may behave differently. So I'd say we still implement the fix to be safe. I'll get to that next week.

karthikbhargavan commented 11 months ago

Interesting. We don't have a Cortex-M4 to test on, but there seem to be some compiler versions that may be worth looking at. E.g. GCC 11.2 with -Os appears to call a division subroutine on ARM32: https://godbolt.org/z/dhYMTTGf8

mkannwischer commented 11 months ago

You need to pass -mcpu=cortex-m4, but it indeed compiles to udiv r0, r0, r3 when using -Os which is certainly variable time. Scary. I'll fix it right now.

mkannwischer commented 11 months ago

The m4 implementations in crypto_kem/kyberXXX have been fixed in #320. However, the clean implementations in mupq/pqclean/crypto_kem/kyberxxx are coming from https://github.com/PQClean/PQClean. We have to wait until that gets updated upstream.

There is an issue for that in PQClean: https://github.com/PQClean/PQClean/issues/533

Keeping this issue open for now to track progress.

James-E-A commented 9 months ago

We have to wait until that gets updated upstream.

@mkannwischer Upstream fixed this (except for the ARM64 optimized implementation) on January 25th.

mkannwischer commented 7 months ago

Resolved in #320 and #332