h2o / picotls

TLS 1.3 implementation in C (master supports RFC8446 as well as draft-26, -27, -28)
536 stars 140 forks source link

Conditional jump or move depends on uninitialised value(s) #450

Open huitema opened 1 year ago

huitema commented 1 year ago

Got this warning when running valgrind to check picoquic code. The issue happens inside openssl, accessed through picotls

This is part of the valgrind report:

==4255== 1526 errors in context 1 of 1: ==4255== Conditional jump or move depends on uninitialised value(s) ==4255== at 0x4AF62A4: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3) ==4255== by 0x4AF6581: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3) ==4255== by 0x49F70F4: EVP_DecryptFinal_ex (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3) ==4255== by 0x1C6724: aead_do_decrypt (openssl.c:962) ==4255== by 0x182F26: picoquic_remove_packet_protection (packet.c:677) ==4255== by 0x18381D: picoquic_parse_header_and_decrypt (packet.c:783) ==4255== by 0x186149: picoquic_incoming_segment (packet.c:2157)

Have we seen that already?

huitema commented 1 year ago

This is new. Previously, Picoquic tests were run using the Picotls version of Thu Sep 15 13:29:56 2022 +0900, after commit 2bb3e454a7aae0f0542374973c9daf83bfef8ffe. The PR that found the issue tests execution with the Picotls version of Wed Nov 30 08:49:47 2022 +0900, after commit 7e97d3e48e237b9a01227cf7f4b116325b356fc6. Something changed between these two commits.

huitema commented 1 year ago

I tried synching at multiple commits, and it seems that the picoquic tests keep failing. One of the issues was found in fusion.c, line 1025...:

    ptls_fusion_aesgcm_context_t *newp;
    if ((newp = aligned_alloc(32, new_ctx_size)) == NULL)
        return NULL;
    memcpy(newp, ctx, old_ctx_size);
    ptls_clear_memory(ctx, old_ctx_size);
    free(ctx);
    ctx = newp;

In theory, the combination of aligned_alloc() and free() should work -- that's what the CPP reference says. I know there was an issue in the past, and that the sanitizer insisted on aligned_alloc() and aligned_free(). I believe that the sanitizer has been fixed, but there may be an issue.

huitema commented 1 year ago

I was able to work around the problem in picoquic by forcing use of "fusion" for AES GCM as explained in this picoquic issue.

Turns out that the issue is already reported in OpenSSL.

huitema commented 1 year ago

I ran the openssl tests through "valgrind". The issue mentioned above reproes, as well as a few others:

valgrind -v --track-origins=yes ./test-openssl.t 
...
==5117== HEAP SUMMARY:
==5117==     in use at exit: 147,948 bytes in 392 blocks
==5117==   total heap usage: 5,494,446 allocs, 5,494,054 frees, 10,132,994,211 bytes allocated
==5117== 
==5117== Searching for pointers to 392 not-freed blocks
==5117== Checked 135,216 bytes
==5117== 
==5117== LEAK SUMMARY:
==5117==    definitely lost: 138,229 bytes in 284 blocks
==5117==    indirectly lost: 5,602 bytes in 54 blocks
==5117==      possibly lost: 0 bytes in 0 blocks
==5117==    still reachable: 4,117 bytes in 54 blocks
==5117==         suppressed: 0 bytes in 0 blocks
==5117== Rerun with --leak-check=full to see details of leaked memory
==5117== 
==5117== ERROR SUMMARY: 2606 errors from 1 contexts (suppressed: 0 from 0)
==5117== 
==5117== 2606 errors in context 1 of 1:
==5117== Conditional jump or move depends on uninitialised value(s)
==5117==    at 0x4AFB234: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117==    by 0x4AFB511: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117==    by 0x49FC0F4: EVP_DecryptFinal_ex (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117==    by 0x142AA4: aead_do_decrypt (openssl.c:1072)
==5117==    by 0x12ECF3: ptls_aead_decrypt (picotls.h:1873)
==5117==    by 0x12ECF3: test_ciphersuite.isra.0 (picotls.c:196)
==5117==    by 0x12EF35: test_aes128gcm (picotls.c:374)
==5117==    by 0x121194: subtest (picotest.c:96)
==5117==    by 0x1412AF: test_picotls (picotls.c:1970)
==5117==    by 0x121194: subtest (picotest.c:96)
==5117==    by 0x112B69: main (openssl.c:580)
==5117==  Uninitialised value was created by a stack allocation
==5117==    at 0x4B73063: ??? (in /usr/lib/x86_64-linux-gnu/libcrypto.so.3)
==5117== 
==5117== ERROR SUMMARY: 2606 errors from 1 contexts (suppressed: 0 from 0)
huitema commented 1 year ago

I think we should consider adding a valgrind CI test. Maybe not the full run of test_openssl.t. Valgrind is slow, and the test above took a long time to execute. But it did find leaks. Most of the time was spent in the many-handshake tests: 2300 seconds for the non-async tests, 2000 seconds for the async tests. Is there a simple way to exclude them?

kazuho commented 1 year ago

Re aligned_alloc in comment https://github.com/h2o/picotls/issues/450#issuecomment-1334853247: please check #454.

kazuho commented 1 year ago

Generally speaking, I'm not sure how much we can trust Valgrind here.

In optimized crypto code, we often load out-of-bounds intentionally. loadn128 and loadn256 functions in fusion.c are good examples. You would notice the NO_SANITIZE_ADDRESS option being set there. They are set because ASAN complains about out-of-bounds access.

I would assume AES-GCM code of libcryto doing something similar.

huitema commented 1 year ago

I have found some real issues using valgrind, issues that were not flagged by ASAN/UBSAN. I like having it as one of the CI checks in picoquic.

The code in fusion does not trip valgrind. This allowed me to work around the openssl issues. I just run valgrind in a configuration that uses fusion instead of OpenSSL's version of AES GCM.

I am not sure if there are macros similar to NO_SANITIZE_ADDRESS for valgrind. If there was, that would be nice.

kazuho commented 1 year ago

I share your irritation. This (possibly bogus) warning could be specific to a particular combination of valgrind and openssl.

I did not see such a warning when I tried to repro on Ubuntu 22.04 (valgrind 3.18.1 / openssl 3.0.2).