llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.19k stars 11.13k forks source link

dead store pass ignores memory clobbering asm statement #15867

Open llvmbot opened 11 years ago

llvmbot commented 11 years ago
Bugzilla Link 15495
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @echristo,@noloader,@pageexec,@pogo59,@socketpair

Extended Description

Consider this function:

include

void foo(int x) { char buf[10]; int i; for (i=0; i<sizeof(buf); ++i) buf[i]=x++; memset(buf,0,sizeof(buf)); }

llvm removes all the write accesses to buf. OK so far. Now let's add an asm statement to tell the optimizer that it cannot remove writes to buf:

void foo(int x) { char buf[10]; int i; for (i=0; i<sizeof(buf); ++i) buf[i]=x++; memset(buf,0,sizeof(buf)); asm("" : : : "memory"); }

llvm ignores the asm statement and still removes all stores to the local buffer. I think this is a bug. At least gcc and icc honor the asm statement and don't remove the stores if it is present.

I was using:

clang version 3.3 (trunk 176552) Target: x86_64-unknown-linux-gnu Thread model: posix

This bug is quite important because it may introduce security problems in crypto code attempting to cleanse keys from memory.

d9f35d94-5ded-4402-8ddf-8e60f27bb643 commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#26997

d9f35d94-5ded-4402-8ddf-8e60f27bb643 commented 8 years ago

Bug llvm/llvm-bugzilla-archive#26997 has been marked as a duplicate of this bug.

d9f35d94-5ded-4402-8ddf-8e60f27bb643 commented 8 years ago

Exactly the same problem: https://llvm.org/bugs/show_bug.cgi?id=26997

Additionally, I want to say:

asm("" : : "g"(&buf) : "memory");

is better than

asm("" : : "r"(&buf) : "memory");

because it does not require to allocate register.

70311826-e6d8-4d29-8604-c4ec4bfa7f0d commented 8 years ago

I think this is not a bug. LLVM analyzes uses of 'buf' and finds that the whole thing is dead and eliminates it.

This is an interesting topic (in morbid sort of way). I wear Paul's hat, so I have security and compliance pressures in addition to all the C and C++ goodness.

From the developers point of view, they got a requirement and translated it into C and C++. He/she likely did not check the generated code.

From an auditor's point of view, he/she wants to ensure a zeroizer is present. They rarely check generated code matches what was written in C and C++.

From the compiler writers point of view, its working as expected. The developer asked for optimizations, the compiler found one, and the compiler took it.


From 10,000 feet, folks working with this problem need a way to express it. The language lacks it, so its an engineering defect in the language specification. For example, something like the following does not exist. Below, assume 'pin' means the statement must be present in the generated code, and reads and writes prior must complete (kind of like a sequence point and barrier and code guard that can't be removed):

pin memset (buff, 0, size);

Or:

pin {
    for (size_t i = 0; i < size; i++)
        buf[i] = 0;
}

A good first step might be to document what developers and auditors can expect. A good place to add it might be http://clang.llvm.org/compatibility.html#inline-asm. Its the only relevant search that returns LLVM specific results (the rest is Google search fodder).

llvmbot commented 11 years ago

actually before designing something like that, i have a question here: what kind of threat is this local variable clearing supposed to address? any attack i can think of (forced coredump, arbitrary process memory reading, exploitable kernel bug, etc) would be powerful enough to not care about such clearing. so what am i missing here? (if it's off topic here, let's do this on, say, oss-security and cc me please ;)

I'm mostly worried about leaving crypto keys behind for people to find with memory or swap file forensics after they take my laptop away from me at the airport or steal it or whatever.

For your scenarios zeroing crypto keys is at most a defense in depth thing, narrowing the window of attack opportunity down. If you assume that somebody can read your memory at will, you have already lost.

pogo59 commented 11 years ago

actually before designing something like that, i have a question here: what kind of threat is this local variable clearing supposed to address? any attack i can think of (forced coredump, arbitrary process memory reading, exploitable kernel bug, etc) would be powerful enough to not care about such clearing. so what am i missing here? (if it's off topic here, let's do this on, say, oss-security and cc me please ;)

The usual reason is to minimize the time window where those sorts of attacks can find interesting bits lying around. The window clearly can't be reduced to zero but you don't want to lock your front door and then leave the key on top of the welcome mat.

5d21713e-153b-4ee3-b86b-e7e1f9a492f9 commented 11 years ago

re comment 4: the asm statement without "memory" but with "r"(&buf) does not say the contents of buf is accessed, it just says it wants the address of buf in a register. So I think gcc has a point.

each input variable is assumed accessed (passing a pointer to an asm stmt is no different from passing it to an external function). the explicit memory clobber tells the compiler that any other memory it knows about should be assumed clobbered as well.

The "dosomethingwith" is there to make sure tmpkey is actually in memory. The asm("; bar\n") is there so I can see in clang -S where the memset should be. Now I do get a memset. In fact, I get a better memset with clang than with gcc (sse instead of movq) :-)

This appears to prove comment 5 right.

you get a memset using the asm input based example in my previous comment as well, regardless of the dosomethingwith call actually. also i don't know what i was doing yesterday but now i get the proper memset from gcc as well (on the previous example too) with using only the asm input variant so i guess you can go with it for each compiler.

I do think we should provide a way to tell the compiler to not use dead store elimination on something.

actually before designing something like that, i have a question here: what kind of threat is this local variable clearing supposed to address? any attack i can think of (forced coredump, arbitrary process memory reading, exploitable kernel bug, etc) would be powerful enough to not care about such clearing. so what am i missing here? (if it's off topic here, let's do this on, say, oss-security and cc me please ;)

llvmbot commented 11 years ago

re comment 4: the asm statement without "memory" but with "r"(&buf) does not say the contents of buf is accessed, it just says it wants the address of buf in a register. So I think gcc has a point.

re comment 5: Hmm, one could argue that this example is at fault, because the whole "key" buffer can be eliminated, not just the memset. To test this, I wrote this little test code:

extern void dosomethingwith(char* key);

void foo(char inkey[32], char* buf, size_t len) { char tmpkey[32]; size_t i;

// crappy "key schedule" for (i=0; i<32; ++i) tmpkey[i]=inkey[i] ^ inkey[(i+1) % 32];

dosomethingwith(tmpkey);

// use the "key" for (i=0; i<len; ++len) buf[i] = (buf[i] + tmpkey[i % 32]) ^ tmpkey[(i+32-1) % 32];

asm("; bar\n");

memset(tmpkey,0,32); asm("" : : : "memory"); }

The "dosomethingwith" is there to make sure tmpkey is actually in memory. The asm("; bar\n") is there so I can see in clang -S where the memset should be. Now I do get a memset. In fact, I get a better memset with clang than with gcc (sse instead of movq) :-)

This appears to prove comment 5 right.

I do think we should provide a way to tell the compiler to not use dead store elimination on something. Maybe an attribute that could be used on tmpkey. Or an attribute that could be used on a function definition, so we could define our own SecureZeroMemory like Windows has. The goal of the attribute would be to make calling an external memset-like function work even if full link time optimization inlines it and finds out it could be elimiated as dead stores. If we add the attribute to tmpkey, the memset could still be inlined by the compiler as it is now with the asm statement.

What do you think? The asm statement does look kind of unfriendly.

llvmbot commented 11 years ago

I think this is not a bug. LLVM analyzes uses of 'buf' and finds that the whole thing is dead and eliminates it.

Think of it another way: the inline asm may access any memory you want, but the compiler fit all of 'buf' into registers instead. Since the address of 'buf' has never escaped, the compiler has proof positive that your inline asm isn't touching it. Yes, we will do this even if you use more bytes in 'buf' than you have bytes of registers (whether it's really in registers or memory is immaterial, what matters is that the compiler has proven that the inline asm may not rely on 'buf' being in memory).

Ultimately, because you need this for security, you care about what data is actually in the machine and not the language-defined machine with the "as if" relationship to the real machine. C and C++ provide volatile as a way to break through the as-if rule, but unfortunately, memset doesn't take a volatile pointer. This will work:

void foo(int x) { char buf[10]; int i; for (i=0; i<sizeof(buf); ++i) buf[i]=x++; volatile *buf_volatile = buf; for (i=0; i<sizeof(buf); ++i) buf_volatile[i] = 0; }

but is inefficient (we generate zeros one byte at a time, since it's a char pointer). LLVM supports a volatile memcpy, but I don't know how to produce such a thing through Clang.

The approach in comment 4 works by telling the compiler that the inline asm absolutely may see the contents of the memory buffer. I would suggest using that.

5d21713e-153b-4ee3-b86b-e7e1f9a492f9 commented 11 years ago

I think this is a bug. At least gcc and icc honor the asm statement and don't remove the stores if it is present.

the problem is that your asm statement doesn't say that buf is (usefully) used afterwards or within the asm. what you're doing is a memory barrier (at least a gcc specific way to express it) but it doesn't itself make dead stores alive. the following however works as expected with clang:

  asm("" : : "r"(&buf));

but then it's gcc's turn to eliminate all stores (not sure if that's a bug in gcc then). so the winner seems to be:

  asm("" : : "r"(&buf) : "memory");
pogo59 commented 11 years ago

Surely you jest? I was in security for too long to jest about it.

The asm statement is well specified and portable to all major compilers on Linux and OS X, which together comprises practically 100% of the Unix environment. Ah, well, if you want to limit yourself to Unix, okay.

llvmbot commented 11 years ago

Surely you jest?

The asm statement is well specified and portable to all major compilers on Linux and OS X, which together comprises practically 100% of the Unix environment.

Declaring the array volatile disables much more of the optimizer than we want to disable. The usual place where you'd run into this problem is crypto code, which has usually been optimized for speed at the expense of months or even years of manual optimization work. Because it is one of the few places where you'd actually notice whether the optimizer was on or not.

But the fact that other workarounds may or may not exist and/or be practical is of no import to whether this is a bug and needs fixing.

pogo59 commented 11 years ago

This bug is quite important because it may introduce security problems in crypto code attempting to cleanse keys from memory.

A safer and more portable practice would be to mark the data 'volatile' because it is well specified and all compilers know what it means. Asm tricks are not portable.