Timely mem::drop execution?

burdges commented 7 years ago

I've found that mem::drop does not necessary run anyplace near where it gets called, which likely results in Mutex or RwLock guards being held during expensive computations.

As a simple example, I've made the following test for a zeroing drop of cryptographic material work by using unsafe { ::std::intrinsics::drop_in_place(&mut s); } instead of ::std::mem::drop(s).

#[derive(Debug, Default)]
pub struct Secret<T>(pub T) where T: Copy;

impl<T> Drop for Secret<T> where T: Copy {
    fn drop(&mut self) {
        unsafe { ::std::intrinsics::volatile_set_memory::<Secret<T>>(self, 0, 1); }
    }
}

#[derive(Debug, Default)]
pub struct AnotherSecret(pub [u8; 32]);

impl Drop for AnotherSecret {
    fn drop(&mut self) {
        unsafe { ::std::ptr::write_volatile::<AnotherSecret>(self, AnotherSecret([0u8; 32])); }
        assert_eq!(self.0,[0u8; 32]);
    }
}

#[cfg(test)]
mod tests {
    // use crypto::digest::Digest;
    // use crypto::sha3::Sha3;

    macro_rules! zeroing_drop_test {
        ($n:path) => {
            let p : *const $n;
            {
                let mut s = $n([3u8; 32]);  p = &s; 
                // ::std::mem::drop(s); 
                unsafe { ::std::intrinsics::drop_in_place(&mut s); }  
            }
            /*
            let mut sha = Sha3::sha3_512();
            let mut r = [0u8; 2*32];
            for i in 0..10000 {
                sha.input(&mut r);
                sha.result(&mut r);
                sha.reset();
            }
            */
            // ::std::thread::sleep(::std::time::Duration::from_secs(10));
            unsafe { assert_eq!((*p).0,[0u8; 32]); }
        }
    }
    #[test]
    fn zeroing_drops() {
        zeroing_drop_test!(super::Secret<[u8; 32]>);
        zeroing_drop_test!(super::AnotherSecret);
    }
}

This test fails if I use ::std::mem::drop(s) or even

#[inline(never)]
pub fn drop_now<T>(_x: T) { }

At first, I imagined this was due to CPU pipelining, except this test still fails if I uncomment the sleep line or the 10k invocations of SHA3, or use an atomic_fense, so the compiler looks like the guilty party.

There are two obvious scenarios where this seems problematic:

First, these zeroing drop calls should run eventually, but the longer the secret remains on the stack the greater the risk it gets compromised via side channel attacks, like being swapped to disk.

Second, we might run an expensive computation while holding a Mutex or RwLock guard that unlocks when dropped. We might even run code needing additional locks, thereby risking deadlocks.

In both case, one could fix the problem by calling drop_in_place, but I'd think that creates use after free risks, although maybe it works if we do something fancier like

#[inline(never)]
pub fn drop_copy_now<T: Copy>(t: mut T) {
    unsafe { ::std::intrinsics::volatile_set_memory::<Secret<T>>(&t, 0, 1); }
}

#[inline(never)]
pub fn drop_drop_now<T: Drop>(t: mut T) {
    unsafe { ::std::intrinsics::drop_in_place(&mut t); }
    unsafe { ::std::intrinsics::volatile_set_memory::<Secret<T>>(&t, 0, 1); }
}

Is there any safe way to ensure a drop happens before some expensive computation, system call, etc.? I suppose returning from fn usually does so, but my drop_now proves this sometimes fails.

eddyb commented 7 years ago

I'm not sure "time" has any say in this. Either you found a bug (in which case, we should investigate it as such) or your testing doesn't reveal whether the destructor was called but some other related property. To be clear, there is no mechanism in Rust which would intentionally "defer" running the destructor.

Manishearth commented 7 years ago

This isn't exactly an issue with drops being delayed:

Print statements on the drop fn show that the drop is being executed at the right time, but the value is still wrong
Filling the value with something other than zero works fine, it's zero that's broken (for ptr::write_volatile. write_memory is broken regardless of the fill value
ptr::write_volatile isn't necessary, self.0 = [0u8; 32]; exhibits the same issue

I would hazard a guess that this does not affect mutexes at all. Rather, it's something wrong with the write in particular.

IIRC it's undefined behavior to rely on the stack (inline) contents of a value after it is dropped, which may be optimizing the unsafe { assert_eq!((*p).0,[0u8; 32]); } to hoist in some cases (not sure why it distinguishes between zero and nonzero here).

For the context of https://github.com/isislovecruft/curve25519-dalek/issues/11, as long as the value is boxed you should be okay. It's not UB to rely on modifications to random non-inline (heap or stack reference) values that may be pointed to by the thing you're dropping. That's how Rc/Arc/Mutex work.

@ubsan @eddyb thoughts?

Manishearth commented 7 years ago

It works with a boxed value:

#![feature(core_intrinsics)]

#[derive(Debug)]
pub struct AnotherSecret(*mut [u8; 32]);

impl Drop for AnotherSecret {
    #[inline(never)]
    fn drop(&mut self) {
        println!("dropping");
        // unsafe { ::std::ptr::write_volatile::<AnotherSecret>(self, AnotherSecret([5u8; 32])); }
        unsafe { *self.0 = [0u8; 32] };
        unsafe { assert_eq!(*self.0,[0u8; 32]); }
    }
}

fn main() {
        let p : *const _;
        {
            let mut s = AnotherSecret(Box::into_raw(Box::new([3u8; 32]))); 
            p = &s; 
            ::std::mem::drop(s); 
            println!("dropped");
            //unsafe { ::std::intrinsics::drop_in_place(&mut s); }  
        }

        unsafe { assert_eq!(*(*p).0,[0u8; 32]); }
}

(you can remove the prints and it will still work)

Basically, don't expect writes to the inline stack part of a value during a destructor to stick around afterwards.

eddyb commented 7 years ago

The volatile thing is definitely weird - but there's of course a much much better to box secrets. That's the fact that moving around the wrapper only moves a pointer, and at any time there is exactly one copy in memory of that secret (at least from a value of that type). That's quite worthwhile to box for, IMO.

Manishearth commented 7 years ago

I was wrong, there's no value sensitivity wrt zero and nonzero. I forgot to remove the assert in the destructor 😄

write_volatile and regular self.0 = foo stack writes are both poofed regardless of the value.

Manishearth commented 7 years ago

Looking at the IR for debug mode https://is.gd/NsBeic (Replaced the assert with a new function to remove noise from the code, ): https://gist.github.com/Manishearth/f911b210a93c4031b927e3c0485f8174

slice_loop_next is where the "bug" comes from.

That code is the code called immediately after the array is initialized (the whole slice-loop stuff).

You'll notice that a pair each of AnotherSecret and [32 x i8] variables are declared on the stack here. Most are temporaries. This is what @eddyb was talking about, you can't guarantee that multiple copies of stack data won't exist. In release mode they should disappear, but the core issue here won't change.

I initially assumed that the read in the assert was being hoisted, because I thought that drop was being inlined. That's not the case, and what's happening is actually much more straightforward: A copy of the stack variable s is passed to drop (which isn't inlined).

You can see that happening here, the IR creates a copy before passing it to drop. Which it's allowed to do. Moving in Rust has no guarantees on what happens to the original stack space, which is what I was referring to earlier. Drop operates on the copy, while the original data stays around on the stack. With inlining, drop would not operate on the copy, however that stack space may be reused by other variables -- you see this in release mode with inline(never) removed.

In conclusion, I don't see a Rust bug here; it's operating on the guarantees it assumes in unsafe code, on code that exhibits undefined behavior. Nor do I see any way to improve on this; the way moves work in Rust is pretty fundamental; stuff on the stack will be copied around even if you don't ask it to.

The correct solution for secret-wiping for crypto is to box it.

burdges commented 7 years ago

Interesting. If one wants this on the stack, then one wants a method to limit copies that goes beyond anything that rust currently provides. Thanks!

There are obviously situations where one needs cryptography but dynamic allocation does not yet exist, like say HSMs. All that sounds like that's an issue for another day though. Worst case, you could spin up an temporary allocator or something.

Did you use pub struct AnotherSecret(*mut [u8; 32]); just for this example? Is there any security issue with say

#[derive(Debug)]
pub struct AnotherSecret(Box([u8; 32]));

impl Drop for AnotherSecret {
    #[inline(never)]
    fn drop(&mut self) {
        *self.0 = [0u8; 32];
    }
}

Manishearth commented 7 years ago

A reference would work too. But then it's harder to ensure that the original array you borrow from is cleared. But possible. You basically need to ensure that such an array is only used to provide clearing slices. You can write a macro that stack allocates a zeroed array, and provides you with a wrapped mutable slice to it. The wrapper has a clearing dtor. Done.

For temporary allocators it would use a reference, so that works too. Ideally you'd use an arena-like model where the arena itself is zero initialized, and you can borrow slices from it via a method, but this method returns wrapped slices that have a clearing dtor.

burdges commented 7 years ago

Yeah, I edited my original comment because I realized the issues with references. Ideally, one allocates cryptographic material in a heap protected by mlock, including ensuring no other mlocked pages intersect it.

Afaik, there is nothing besides mlock wrong with pub struct AnotherSecret(Box([u8; 32])); though, but I'm not 100% sure I understand these

impl<T: ?Sized> Deref for Box<T> {
    type Target = T;

    fn deref(&self) -> &T {
        &**self
    }
}

impl<T: ?Sized> DerefMut for Box<T> {
    fn deref_mut(&mut self) -> &mut T {
        &mut **self
    }
}

I suppose this inner * comes because Deref for Unique<T> returns a reference to a pointer, while the outer &* must be a type conversion from the *mut T to a T and then returning its address. It's just a fancy way to return the address, but somehow it converts a *mut T to a &T or &mut T without needing unsafe.

Manishearth commented 7 years ago

This is because Box<T> is special. *b on a box does not call Deref::deref (else these impls would be recursing infinitely, **self is **(&Box<T>), which is *Box<T>). It instead does an operation defined in the compiler. You want Deref to be implemented so that folks can still use box with T: Deref.

The reason box is special is mostly a holdover from the days when Box was a builtin type ~T. The compiler had a separate type for it then and still does. I suspect that changing the struct definition of Box<T> won't actually affect anything. As far as the user is concerned, Box is still special in two ways, one is that it can be used with the unstable box (placement box / box pattern) syntax, and the second is that you can move out of the deref. let foo = *b will work, moving the contents out of the box and deallocating it (but not calling destructors on its contents). No Deref behavior on any other type works like this, because Deref returns a reference. We probably eventually want to create a DerefMove trait and remove this special casing, but for now it just works this way. Because of this, the deref operator on a box goes through the compiler.

There's no fancy type conversion involved, it's just funky internals. pub struct AnotherSecret(Box([u8; 32])); will work fine as long as you impl a clearing drop on AnotherSecret

burdges commented 7 years ago

Ok thank you! I also worried that Box is a wrapper on a Unique<T> but maybe secrets should be a Shared<T> or *mut T for some reason. After some thought, I suppose a Shared<T> is like a *const T with Copy, which sounds undesirable, while Unique<T> is a *mut T with Send/Sync, which sounds fine.

Ideally, one should probably update tars to the new allocator traits or something. I've post a quick and dirty crate to do this the cheap way discussed here however : https://github.com/burdges/zerodrop-rs

Manishearth commented 7 years ago

Shared<T> is for usage in Rc<T>-like types, Unique is for box-like types. The differences lie in the implementation of Send/Sync. These types also provide ownership relations for dropck soundness, and for calculating variance.

You can just use Box here.

The allocators stuff is still unstable, you probably don't want to rely on it.

Btw, for zerodrop-rs, just use Box::new instead of box and you won't need #![feature(box_syntax)] so it will work on stable.

Manishearth commented 7 years ago

Also, please don't call drop_in_place(self) to run destructors on the inner type. Your drop implementation should be something like

/// Zero a `ZeroDrop<T>` when dropped.
impl<T> Drop for ZeroDropDrop<T> where T: Drop+Default {
    #[inline(never)]
    fn drop(&mut self) {
         drop(mem::replace(&mut *self.0, Default::default()));
    }
}

burdges commented 7 years ago

Yes, I've too much to do to worry about allocators anytime soon. ;)

I used box because I did not want it copying the value to the stack, which Box::new seemingly requires. It seems tricky to create a Box without touching the stack actually. If T: Copy+Sized then I can create it from uninitialized data and call clone_from, but I'm surprised there is no Box::new_clone(&T) method.

In fact, I'll need to replace Default::default for the T: Copy variants with unsafe zeroing because Rust still lacks [u8; n] for n > 32 and users cannot add those impls themselves.

I'll look into mem::replace and try to figure out if it hits the stack, thanks!

Manishearth commented 7 years ago

You can do Box::new(mem::uninitialized()) and then manually copy it over if you want to avoid the stack copy. The thing that box gets you that can't be replicated is the ability to set the place in which the return value of the expression given to it is written avoiding an extra copy in case of a large struct.

drop_in_place(self) is completely wrong, btw, since you end up dropping the type twice. Destructors will be called on the box and its containing type right after your Drop::drop impl. drop_in_place(mem::replace(&mut self.0, Default::default())) is better, but probably unnecessary. It might be better to define a trait Clear with a clear(&mut self) function that clears its contents , and bound your types on that, and use it.

burdges commented 7 years ago

Appears mem::replace might place data onto the stack, so that's probably out as written. In fact, I should probably write

drop(&mut self) {
    let s: &mut T = self.0.deref_mut();
    unsafe {
        ::std::ptr::drop_in_place::<T>(s);
        ::std::ptr::write::<T>(s,Default::default());
    }
}

so there is still a second inner drop, but only dropping the value returned by Default::default() and no third drop created by say *s = Default::default(). At present, all the Drop types I've considered need special implementations anyways, so maybe ZeroDropDrop makes no sense.

Manishearth commented 7 years ago

Yeah, drop_in_place used like that is fine.

rust-lang / rfcs

Timely mem::drop execution? #1850