mupq / pqm4

Post-quantum crypto library for the ARM Cortex-M4
280 stars 68 forks source link

Possible variable-time division when decapsulating in Kyber #319

Closed xvzcf closed 3 months ago

xvzcf commented 7 months ago

This bit of code or similar is used in your various implementations of Kyber to compress a polynomial ring element into a (secret) message:

https://github.com/mupq/pqm4/blob/dc26f540ad2d40b70c86d59e9d54eaaab2a2eeed/crypto_kem/kyber768/m4fspeed/poly.c#L550

To do so, it performs a division by Q that might not necessarily compile to a multiplication instruction: looking at the output of some C compilers using https://godbolt.org/z/sKn3TKKGq and https://godbolt.org/z/8GqKoTfYh for example, a division instruction is emitted even when -O3 is specified. Should a division instruction be emitted, its execution time would likely be variable and leak information about its secret input.

We reported a similar issue in the CRYSTALS-Kyber reference implementation; you may want to use their fix: https://github.com/pq-crystals/kyber/commit/dda29cc63af721981ee2c831cf00822e69be3220

mkannwischer commented 7 months ago

@xvzcf, Thank you for reporting this! It seems that for the Cortex-M4 neither clang nor gcc emits divison instructions at any optimization level (I tested the recent versions with -O{0,1,2,3}). That also makes sense to me because the umull/smull instructions take only 1 cycle, while udiv/sdiv take up to 12 cycles.

Yet, people may take this code and run it on other cores implementing (a superset of) Armv7E-M which may behave differenty. Also other compiler versions may behave differently. So I'd say we still implement the fix to be safe. I'll get to that next week.

karthikbhargavan commented 7 months ago

Interesting. We don't have a Cortex-M4 to test on, but there seem to be some compiler versions that may be worth looking at. E.g. GCC 11.2 with -Os appears to call a division subroutine on ARM32: https://godbolt.org/z/dhYMTTGf8

mkannwischer commented 7 months ago

You need to pass -mcpu=cortex-m4, but it indeed compiles to udiv r0, r0, r3 when using -Os which is certainly variable time. Scary. I'll fix it right now.

mkannwischer commented 7 months ago

The m4 implementations in crypto_kem/kyberXXX have been fixed in #320. However, the clean implementations in mupq/pqclean/crypto_kem/kyberxxx are coming from https://github.com/PQClean/PQClean. We have to wait until that gets updated upstream.

There is an issue for that in PQClean: https://github.com/PQClean/PQClean/issues/533

Keeping this issue open for now to track progress.

JamesTheAwesomeDude commented 5 months ago

We have to wait until that gets updated upstream.

@mkannwischer Upstream fixed this (except for the ARM64 optimized implementation) on January 25th.

mkannwischer commented 3 months ago

Resolved in #320 and #332