janmojzis / tinyssh

TinySSH is small server (less than 100000 words of code)
Creative Commons Zero v1.0 Universal
1.44k stars 79 forks source link

cleanup macro is removed by optimizing compilers #10

Closed pascal-cuoq closed 8 years ago

pascal-cuoq commented 8 years ago

This report is for an instance of CWE-14 “Compiler Removal of Code to Clear Buffers” https://cwe.mitre.org/data/definitions/14.html

A cleanup macro is defined here and is used in several places to clear local variables of secrets: https://github.com/janmojzis/tinyssh/blob/97dd9e05f52482e46d660af81547c7c02669c1a2/crypto/cleanup.h

For instance on my computer, gcc invokes clang:

~/tinyssh $ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.4.0

And the result of the compilation of the function crypto_scalarmult_nistp256_tinynacl, that invokes the cleanup macro, is:

~/tinyssh $ otool -tV build/lib/libtinynacl.a | grep -A50 crypto_scalarmult_nistp256_tinynacl
_crypto_scalarmult_nistp256_tinynacl:
0000000000000000    pushq   %r15
0000000000000002    pushq   %r14
0000000000000004    pushq   %r12
0000000000000006    pushq   %rbx
0000000000000007    subq    $0xc8, %rsp
000000000000000e    movq    %rsi, %r14
0000000000000011    movq    %rdi, %rbx
0000000000000014    movq    ___stack_chk_guard(%rip), %r12
000000000000001b    movq    _crypto_scalarmult_nistp256_tinynacl(%r12), %rax
000000000000001f    movq    %rax, 0xc0(%rsp)
0000000000000027    leaq    0x60(%rsp), %rdi
000000000000002c    movq    %rdx, %rsi
000000000000002f    callq   _gep256_frombytes
0000000000000034    testl   %eax, %eax
0000000000000036    jne 0x5f
0000000000000038    leaq    _crypto_scalarmult_nistp256_tinynacl(%rsp), %r15
000000000000003c    leaq    0x60(%rsp), %rsi
0000000000000041    movq    %r15, %rdi
0000000000000044    movq    %r14, %rdx
0000000000000047    callq   _gep256_scalarmult
000000000000004c    movq    %rbx, %rdi
000000000000004f    movq    %r15, %rsi
0000000000000052    callq   _gep256_tobytes
0000000000000057    movl    %eax, %ecx
0000000000000059    xorl    %eax, %eax
000000000000005b    testl   %ecx, %ecx
000000000000005d    je  0xa3
000000000000005f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x38(%rbx)
0000000000000067    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x30(%rbx)
000000000000006f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x28(%rbx)
0000000000000077    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x20(%rbx)
000000000000007f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x18(%rbx)
0000000000000087    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x10(%rbx)
000000000000008f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x8(%rbx)
0000000000000097    movq    $_crypto_scalarmult_nistp256_tinynacl, _crypto_scalarmult_nistp256_tinynacl(%rbx)
000000000000009e    movl    $0xffffffff, %eax       ## imm = 0xFFFFFFFF
00000000000000a3    movq    _crypto_scalarmult_nistp256_tinynacl(%r12), %rcx
00000000000000a7    cmpq    0xc0(%rsp), %rcx
00000000000000af    jne 0xc0
00000000000000b1    addq    $0xc8, %rsp
00000000000000b8    popq    %rbx
00000000000000b9    popq    %r12
00000000000000bb    popq    %r14
00000000000000bd    popq    %r15
00000000000000bf    ret
00000000000000c0    callq   ___stack_chk_fail
00000000000000c5    nopw    %cs:_crypto_scalarmult_nistp256_tinynacl(%rax,%rax)
_crypto_scalarmult_nistp256_tinynacl_base:
…

The above is the translation of the source code below, from crypto/crypto_scalarmult_nistp256.c:

int crypto_scalarmult_nistp256_tinynacl(unsigned char *q, const unsigned char *n, const unsigned char *p) {

    gep256 P, Q;
    long long i;
    int ret = -1;

    if (gep256_frombytes(P, p) != 0) goto fail;
    gep256_scalarmult(Q, P, n);
    if (gep256_tobytes(q, Q) != 0) goto fail;
    ret = 0;
    goto cleanup;

fail:
    for (i = 0; i < 64; ++i) q[i] = 0;

cleanup:
    cleanup(P); cleanup(Q);
    return ret;
}

The for (i = 0; i < 64; ++i) q[i] = 0; was translated to the series of 8 movq instructions. The code that follows is the canary check. The code cleanup(P); cleanup(Q); was translated to nothing.

The real GCC, or any modern optimizing C compiler, will do the same and translate the invocations of cleanup() on all local variables at the end of their scopes to nothing.

There exists no perfect solution for this problem. The C11 standard introduced memset_s but it is the still the wrong idiom http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html and GCC can still translate that to nothing: http://goo.gl/LDfPHG (gcc.godbolt.org link by Samuel Neves). Also not everyone is using a C11 compiler yet.

I found that if I simply convert the array passed as argument to the cleanup() macro to a volatile pointer, my compiler does generate the code to clean up the local arrays:

~/tinyssh $ git diff
diff --git a/crypto/cleanup.h b/crypto/cleanup.h
index 0566c59..efb95d3 100644
--- a/crypto/cleanup.h
+++ b/crypto/cleanup.h
@@ -1,6 +1,6 @@
 #ifndef _CLEANUP_H____
 #define _CLEANUP_H____

-#define cleanup(x) for (i = 0; i < sizeof(x); ++i) ((char *)x)[i] = 0;
+#define cleanup(x) for (i = 0; i < sizeof(x); ++i) ((volatile char *)x)[i] = 0;

 #endif
~/tinyssh $ otool -tV build/lib/libtinynacl.a | grep -A50 crypto_scalarmult_nistp256_tinynacl
_crypto_scalarmult_nistp256_tinynacl:
0000000000000000    pushq   %r15
0000000000000002    pushq   %r14
0000000000000004    pushq   %r12
0000000000000006    pushq   %rbx
0000000000000007    subq    $0xc8, %rsp
000000000000000e    movq    %rsi, %r14
0000000000000011    movq    %rdi, %rbx
0000000000000014    movq    ___stack_chk_guard(%rip), %r12
000000000000001b    movq    _crypto_scalarmult_nistp256_tinynacl(%r12), %rax
000000000000001f    movq    %rax, 0xc0(%rsp)
0000000000000027    leaq    0x60(%rsp), %rdi
000000000000002c    movq    %rdx, %rsi
000000000000002f    callq   _gep256_frombytes
0000000000000034    testl   %eax, %eax
0000000000000036    jne 0x5f
0000000000000038    leaq    _crypto_scalarmult_nistp256_tinynacl(%rsp), %r15
000000000000003c    leaq    0x60(%rsp), %rsi
0000000000000041    movq    %r15, %rdi
0000000000000044    movq    %r14, %rdx
0000000000000047    callq   _gep256_scalarmult
000000000000004c    movq    %rbx, %rdi
000000000000004f    movq    %r15, %rsi
0000000000000052    callq   _gep256_tobytes
0000000000000057    movl    %eax, %ecx
0000000000000059    xorl    %eax, %eax
000000000000005b    testl   %ecx, %ecx
000000000000005d    je  0xa3
000000000000005f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x38(%rbx)
0000000000000067    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x30(%rbx)
000000000000006f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x28(%rbx)
0000000000000077    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x20(%rbx)
000000000000007f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x18(%rbx)
0000000000000087    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x10(%rbx)
000000000000008f    movq    $_crypto_scalarmult_nistp256_tinynacl, 0x8(%rbx)
0000000000000097    movq    $_crypto_scalarmult_nistp256_tinynacl, _crypto_scalarmult_nistp256_tinynacl(%rbx)
000000000000009e    movl    $0xffffffff, %eax       ## imm = 0xFFFFFFFF
00000000000000a3    xorl    %ecx, %ecx
00000000000000a5    xorl    %edx, %edx
00000000000000a7    nopw    _crypto_scalarmult_nistp256_tinynacl(%rax,%rax)
00000000000000b0    movb    $_crypto_scalarmult_nistp256_tinynacl, 0x60(%rsp,%rdx)
00000000000000b5    incq    %rdx
00000000000000b8    cmpq    $0x60, %rdx
00000000000000bc    jne 0xb0
00000000000000be    nop
00000000000000c0    movb    $_crypto_scalarmult_nistp256_tinynacl, _crypto_scalarmult_nistp256_tinynacl(%rsp,%rcx)
00000000000000c4    incq    %rcx
00000000000000c7    cmpq    $0x60, %rcx
00000000000000cb    jne 0xc0
00000000000000cd    movq    _crypto_scalarmult_nistp256_tinynacl(%r12), %rcx
00000000000000d1    cmpq    0xc0(%rsp), %rcx
00000000000000d9    jne 0xea
00000000000000db    addq    $0xc8, %rsp
00000000000000e2    popq    %rbx
00000000000000e3    popq    %r12
00000000000000e5    popq    %r14
00000000000000e7    popq    %r15
00000000000000e9    ret
00000000000000ea    callq   ___stack_chk_fail
00000000000000ef    nop
_crypto_scalarmult_nistp256_tinynacl_base:
…

Using the volatile qualifier this way is not perfect either, but it at least tricks some current compilers into doing the right thing, which is somewhat better than the reliable elimination of dead stores that happens without it.

thestinger commented 8 years ago

Declaring the function in the header and having the implementation provided by a separate translation unit would be enough to make it work without LTO but that's not at all reassuring. I don't think volatile is guaranteed to result in zeroing of the buffer itself here but it's better than nothing.

This approach is the best you can do for Clang and GCC but it's not portable so it would require an ifdef with an inferior alternative elsewhere:

void *explicit_memset(void *s, int c, size_t n) {
    void *ptr = memset(s, c, n);
    __asm__ __volatile__("" : : "r"(ptr) : "memory");
    return ptr;
}
briansmith commented 8 years ago

On Windows, you can use SecureZeroMemory. If the compiler provides it, you can use memset_s. On Mac OS X you can use memset_s (IIUC).

The C11 standard introduced memset_s but it is the still the wrong idiom http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html and GCC can still translate that to nothing: http://goo.gl/LDfPHG (gcc.godbolt.org link by Samuel Neves).

The main problem with that code is that the provided memset_s isn't compliant with the standard. I believe you can't write memset_s in ISO C.

See https://github.com/briansmith/ring/issues/14 for some background info. Your explicit_memset is exactly what BoringSSL's OPENSSL_cleanse does except BoringSSL uses SecureZeroMemory on Windows.

janmojzis commented 8 years ago

Thank You very much!

here is the fix: https://github.com/janmojzis/tinyssh/commit/7d9527f72d82dfe8bc3ec40a609dc87e3422d242