rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.91k stars 12.52k forks source link

Using pattern match directly on LockResult causes deadlock #37612

Open lloydmeta opened 7 years ago

lloydmeta commented 7 years ago

Came across this as I was playing around with implementing memoisation via double-checked-locking with sync::RWLock:

The following, which puts the .read() result into a temporary binding works fine:

// ...
{
    let lock_result = self.cache.read(); // <-- temporary binding
    match lock_result {
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    }
}
.unwrap_or_else(|| {
    // We need to write now, so get a write lock
    let mut data = self.cache.write().unwrap();
// ...

The following, which directly matches on the result of .read(), dies in a deadlock:

// ...
{
    match self.cache.read() { // <-- direct pattern matching
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    }
}
.unwrap_or_else(|| {
    // We need to write now, so get a write lock
    let mut data = self.cache.write().unwrap();
// ...

I'm guessing the direct pattern match is compiled/desugared in a way that the RwLockReadGuard never goes out of scope and thus the read lock is never released, but I'm not sure if this is the expected behaviour. It certainly surprised me, especially since the entire "read" block is scoped by its own set of braces.

For extra context, here is the relevant part in my scratchpad project. I have a concurrency test for it that fails to finish if the pattern matching section is changed to match the latter case.

hanna-kruppe commented 7 years ago

Temporaries live for the entire statement, never shorter. There are cases where they are made to live longer (e.g. to make stuff like let x = &5; work) and obviously the surrounding expression can move the temporary into a function where it'll be dropped earlier, but "left to its own devices" a temporary will live until the end of the statement.

The second snippet shows part of one huge multi-line statement, and the extra set of braces around the match is as irrelevant as it is in, say, let x = {5};. In contrast, in the first snippet the self.cache.read() temporary is moved into a short-lived variable and thus dropped at the brace between the match and the unwrap_or_else call.

So everything is working as intended IMHO. Still, what can we do to help people who run into it in the future? Maybe a lint could look for multiple locking API calls within the same expression. However, it would need to be carefully written to avoid a false positive for code like the second. Or maybe it's better to work on MIR more generally and try to detect overlapping code regions holding the same lock, even if it's spread across statements. Sounds like a nice idea for clippy (cc @Manishearth @llogiq).

lloydmeta commented 7 years ago

Interesting, I think for me, the main point of confusion is this:

The second snippet shows part of one huge multi-line statement, and the extra set of braces around the match is as irrelevant as it is in, say, let x = {5};

From my reading of the Rust book, I was under the impression that braces {} define blocks and blocks define binding scope, so that any bindings defined inside them will be out of scope outside them. For example this section on bindings alludes to that.

In this case, it was primarily non-obvious to me that the braces would be irrelevant for the scoping of pattern match bindings (causing the lock to not be dropped), and I'm now wondering what the reasoning is behind that. Is it reasonable and/or possible to make the braces significant in this case?

Otherwise, yes, a warning would be nice :)

PS for what it's worth, turning it into an if let to make the binding more explicit also deadlocks

if let Ok(ref data) = self.cache.read() {
    if data.len() > to {
        Some(data[to].clone())
    } else {
        None
    }
} else {
    None
}

EDIT: a few more findings

Adding a (); (or anything really println!, let x = 1) to make the block multiline doesn't help, the following still deadlocks:

{
    (); // block consists of an expression statement and an expression but still fails to release lock
    match self.cache.read() {
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    }
}

This however, works fine:

{
    let t = match self.cache.read() {
        Ok(ref data) if data.len() > to => Some(data[to].clone()),
        _ => None,
    };
    t
}
arielb1 commented 7 years ago

The destructors for temporaries in tail expressions run when the containing block ends. The intent is to make braces "transparent". We need good examples for that.

lloydmeta commented 7 years ago

Writing sound multithreaded code is hard enough as it is, and so is testing it correctly to find bugs. In this case I got almost everything right, but at the end, got snagged by what looks like a hard-to-understand syntactical mistake.

I still don't really understand the full breadth of why some examples work and others don't. But I'd like to pitch in to make it harder to for others to make the same mistake.

Mark-Simulacrum commented 7 years ago

This, as expected, reproduces today. I'm uncertain whether we don't want it to... it feels like this isn't actually a bug (cc @rust-lang/lang, can you confirm?)

lloydmeta commented 7 years ago

Reviewing this after 1/2 year, I have to say I'm still uneasy about the current behaviour. I get that there may be reasons for it from a language-implementations point of view, but from a end-user/language-user standpoint, it doesn't seem desirable.

In this particular case, my expectations of referential transparency as well as brace-based scoping were broken, leading to a deadly silent runtime failure.

Having said that, I'd really like to see a reason why the current behaviour is desirable for the end user. Short of one, IMHO the implementation should be fixed, or at the very least, docs updated and a warning thrown at compile time

EDIT: grammar fix.

Mark-Simulacrum commented 7 years ago

Oh, no, I forgot to include my minimal example. Well, I can retype it up if anyone wants me to, otherwise I'll leave it as-is since composing it was rather.. difficult, IIRC.

nikomatsakis commented 7 years ago

I agree that these rules can be confusing, particularly around match appearing in tail position. That said, I'm not sure what's the best fix -- and obviously any change here would most likely be backwards incompatible (i.e., some people may be relying on the dtor to run when it does).

Havvy commented 7 years ago

I believe this is a duplicate of #21114.

lloydmeta commented 7 years ago

@Havvy Definitely looks related given this comment and the following response.

Feels good to see that it is a bug that needs to be fixed after all.

arielb1 commented 7 years ago

The reason this works is because you are supposed to be able to write this:


{
    match self.cache.read() { // <-- direct pattern matching
        Ok(ref data) => Some(data)
        _ => None,
    }
}.map(|data| {
    // use `data` - the lock better be held
})

That pattern is quite common in some places and should better work.

arielb1 commented 7 years ago

Of course, because destructor locations aren't sensitive to lifetime inference, the lock must also be held in your example.

Also, #21114 is about the borrow checker being overly stupid, not about destructor ordering, so @pnkfelix was wrong in his diagnosis.

lloydmeta commented 7 years ago

@arielb1 Thanks for that example. I must say, it's one of the most convincing examples I've seen thus far. That said, I think:

  1. The braces surrounding the match and before map are superfluous; you can do without them (Playground link):

    match mutex.lock() {
      Ok(ref data) => Some(data.0),
      _ => None,
    }.map(|data| {
      data + 1
    })
  2. That pattern seems a bit unnecessarily verbose; why not just work with ref data inside the pattern match arm?

I have no doubt though that there is code like this out there; it does seem very convenient to do something like this and perhaps sometimes it's can happen due to personal preference for style.

ryantheleach commented 7 years ago

I've not programmed in rust ever, but just from the little I've read about it,

{ //start scoping.
    match self.cache.read() { //gain a lock and read.
        Ok(ref data) => Some(data) 
        _ => None,
    } //no semi colon so data leaks out of the next scope e.g. it's not unit()
//end scope.
}.map(|data| { //data is available, but the lock is out of scope.
    // use `data` - the lock better be held, but it's not, because you ended the scope, error?
})

vs some pseudo code ish looking thing.

{ 
    let x = match self.cache.read() { 
        Ok(ref data) => Some(data) 
        _ => None,
    } 
   x.maintainLock? x.lend? //how do you punch the lock through this? guessing you don't or can't. so why does it make sense to have brackets in the first place?
}.map(|data| {
    // use `data` - the lock better be held
})
let x = { 
    match self.cache.read() { 
        Ok(ref data) => Some(data) 
        _ => None,
    } 
}
//this deadzone is worrying though.
x.lock().map(|data| {
    // use `data` - the lock better be held
})

I can't think of a way that makes syntactic sense, given the simple rules I've briefly heard about.

Havvy commented 7 years ago

@ryantheleach - Temporaries created in the final expression of a block live as long as temporaries in the statement containing the block.

ryantheleach commented 7 years ago

That makes a little more sense now.

PieterPenninckx commented 7 years ago

Not knowing the precise rules, my intuition was that temporaries would be dropped "as soon as possible". This is clearly wrong, as people explained in their comments and as explained in the reference.

Adjusting the language so that it matches my intuition is impossible in general, I think, because it would change the behavior of existing code. So instead it's the intuition -- or at least the knowledge -- that should be changed. So I opened two issues: one for The Book and one for clippy.

I think however that this issue is precisely what linear data types (rust-lang/rfcs#814, or at least part thereof) intend to solve. In particular, the compiler could give a warning when a value of a data type that is marked as "must be dropped explicitly", goes out of scope without being dropped explicitly. (Edit: please note that RFC 814 is different from what I propose here.) I'm not sure however what would be the best way to treat compound data types (e.g. when a field of a struct is marked as "must be dropped explicitly"), so I'm leaving this as a suggestion.

pnkfelix commented 5 years ago

Just doing some triage.

I do not believe this to be a duplicate of #21114.

Some people connected the two issues due to this comment https://github.com/rust-lang/rust/issues/21114#issuecomment-74438455 and my response https://github.com/rust-lang/rust/issues/21114#issuecomment-74508522, but its important to note a couple things.

The examples given here, however, are about the consequences of the dynamic semantics of the language. The consequences listed here are, by definition, not cases where adding of a semi-colon or a let x = ...; x is spurious; rather, it has an observable effect on the drop order of rvalue temporaries.

Anyway, this is a long winded build up to me saying that while I do not believe this to be a duplicate of #21114, I do think it is a duplicate or perhaps sub-bug of #46413.

ichn-hu commented 1 year ago

the following case looks much more confusing (and error-prone) than matching directly on lock result, and it seems to be of the same reason, we happen to find it in our production code which no one spot it may cause problem when the code is merged.

in the following POC, using wait_for_update1 would not deadlock, and wait_for_update2 will

use std::sync::Arc;
use tokio::sync::{Notify, RwLock};

#[derive(Default)]
struct ProxyInner {
    nr_updates: u64,
}

impl ProxyInner {
    fn update(&mut self) {
        self.nr_updates += 1;
    }

    fn get_update(&self) -> Option<u64> {
        if self.nr_updates % 2 == 0 {
            Some(self.nr_updates)
        } else {
            None
        }
    }
}

#[derive(Clone)]
struct ProxyClient {
    inner: Arc<RwLock<ProxyInner>>,
    notify: Arc<Notify>,
}

impl ProxyClient {
    fn new() -> ProxyClient {
        ProxyClient {
            inner: Arc::new(RwLock::new(ProxyInner::default())),
            notify: Arc::new(Notify::new()),
        }
    }

    async fn update(&self) {
        self.inner.write().await.update();
        // do something
        self.notify.notify_waiters();
    }

    // works fine
    async fn wait_for_update1(&self) -> Option<u64> {
        loop {
            let res = self.inner.read().await.get_update();
            match res {
                Some(nr) => return Some(nr),
                None => {
                    self.notify.notified().await;
                }
            }
        }
    }

    // may deadlock
    async fn wait_for_update2(&self) -> Option<u64> {
        loop {
            match self.inner.read().await.get_update() {
                Some(nr) => return Some(nr),
                None => {
                    self.notify.notified().await;
                }
            }
        }
    }
}

#[tokio::main(worker_threads = 2)]
async fn main() {
    let pc = ProxyClient::new();

    let task1 = {
        let pc = pc.clone();
        tokio::spawn(async move {
            loop {
                pc.update().await;
            }
        })
    };

    let task2 = {
        let pc = pc.clone();
        tokio::spawn(async move {
            loop {
                let nr = pc.wait_for_update1().await;
                println!("task2: nr = {:?}", nr);
            }
        })
    };

    task1.await.unwrap();
    task2.await.unwrap();
}