lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.51k stars 745 forks source link

[test-triage] //sw/device/tests/crypto:aes_gcm_timing_test not constant time with icache enabled #15788

Open drewmacrae opened 1 year ago

drewmacrae commented 1 year ago

Hierarchy of regression failure

Chip Level

Failure Description

Looks like something broke this test in master while it is lacking coverage in CI (it was incorrectly marked as long to save resources in CI)

I00002 aes_gcm_testutils.c:86] aes_gcm_decrypt took 19221 cycles^M^M I00003 aes_gcm_testutils.c:86] aes_gcm_decrypt took 19034 cycles^M^M I00004 aes_gcm_testutils.c:86] aes_gcm_decrypt took 19041 cycles^M^M E00005 aes_gcm_timing_test.c:44] CHECK-fail: AES-GCM decryption was not constant-time for different invalid tags^M^M I00006 status.c:34] FAIL!^M^M

Steps to Reproduce

bazel test //sw/device/tests/crypto:aes_gcm_timing_test

Tests with similar or related failures

No response

drewmacrae commented 1 year ago

This was working when it was a test_rom test by default, but didn't get marked as a test_rom test when aff3a6d0f164dab4f41aad35a233a1701b6b1ff8 enabled rom tests more generally.

drewmacrae commented 1 year ago

This test should be marked as a default "short" test, and should target only the test_rom

drewmacrae commented 1 year ago

~Bisecting finds commit 68d16cf7c32b9edbc19828b90aa2ea9ad4833506 (HEAD) Author: Eitan Shapira eitanshapira89@gmail.com Date: Thu Oct 20 17:30:55 2022 +0300~

~Small updates to flash_ctrl_rand_ops_base_vseq~

~But that seems wrong. Checking out the parent will not make it pass, so it's breaking, and not as hermetic as I'd like. I'm going to try rebooting to see if that helps, but we may be seeing some bitstream cache dependency here.~ (This is a red herring AFAIK)

drewmacrae commented 1 year ago

This test is definitely flaky on the cw310 I've seen

commit bd9f6d0199af9e3358289c0b3a3900921afe8d02 (HEAD)
Author: Timothy Chen <timothytim@google.com>           
Date:   Fri Oct 21 11:10:12 2022 -0700

both pass and fail. I wonder if there's an interaction with the sam3 in this test, though that it fails in verilator indicates that's not the only problem.

Is it flaky in verilator?

drewmacrae commented 1 year ago

Hmm, bisecting with verilator is showing some problems with:

[e5de5869cfd24ec6d23d19b2f75e89f7d9a61b7d] [test_rom] enable icache / Ibex security features

Makes sense that this would affect the number of cycles to time the operation. (the aes block is probably still accomplishing it in constant time, but the software that times it is probably accelerated by cache hits)

drewmacrae commented 1 year ago

@alphan @jadephilipoom If the icache is causing things not to be constant time, should this test disable the icache? Should it prime the icache by running in a loop? What's OpenTitan's stance on the icache and constant time?

alphan commented 1 year ago

@alphan @jadephilipoom If the icache is causing things not to be constant time, should this test disable the icache? Should it prime the icache by running in a loop? What's OpenTitan's stance on the icache and constant time?

I think this is the first time we have encountered this issue in tests since we enabled icache only recently. @jadephilipoom you've probably thought about this and was expecting this but I'm not sure if the cryptolib has any policy on this yet. Obviously, it feels like disabling icache is the safest option. cryptolib should probably do this in a push/pop fashion, i.e. store current config, disable, restore orig config, in order not to affect the overall performance of the system.

drewmacrae commented 1 year ago

I would expect the icache doesn't affect the block and that the constant-time property is supposed to be of-the-block, and that the test should have the icache off, so it can assess the timing of the block behavior better. So all our constant time tests should have it off.

#15832 asks the separate question about cryptolib.

jadephilipoom commented 1 year ago

My feeling is that the cryptolib should disable icache to protect against cache-timing attacks, and the push/pop method that @alphan mentioned seems appropriate. It's fine with me to disable the icache in this test for now; once I've properly connected the AES-GCM implementation to the top-level API, I'll disable it in the top-level code and won't need to specifically disable it for the test.

cindychip commented 1 year ago

@msfschaffner Hey michael, is this task marked to M3 or TEST Triage?

tjaychen commented 1 year ago

this should probably be M3. This isn't a functional issue per se, nor does there need to be a new test. More of a test / driver update (where appropriate) to disable the icache prior to running crypto commands on ibex.

drewmacrae commented 1 year ago

Sorry, looks like, I didn't update the tags after performing the triage.

johannheyszl commented 1 year ago

Since it was mentioned in SW WG today, summarizing from above: This is not a HW issue and will be fixed in cryptolib SW by disabling icache during the ibex-based part of AES-GCM (GHASH) processing. Can thus be postponed to M2.5.3 or M3 likely.

johannheyszl commented 1 year ago

moved to M2.5.5 which is the “cryptolib” Milestone