False positive in -Wthread-safety-analysis when using mutexes in collections

rupprecht commented 2 years ago

I found this while making fixes for D129755, but it looks like this is an existing limitation of -Wthread-safety-analysis.

When using mutexes as part of a collection, thread safety analysis seems to get confused and think that all mutexes in the collection are the same.

void bad_loop() {
    std::vector<std::mutex *> mutexes{new std::mutex(), new std::mutex()};

    for (std::mutex *m : mutexes) { // error: expecting mutex 'm' to be held at start of each loop [-Werror,-Wthread-safety-analysis]
        m->lock(); // note: mutex acquired here
    }

    // This is where we do stuff that's only safe when _all_ locks are locked

    for (std::mutex *m : mutexes) {
        m->unlock(); // error: releasing mutex 'm' that was not held [-Werror,-Wthread-safety-analysis]
    }
}

The analysis would make sense if we were locking the same mutex over and over again, but m is a different mutex on each loop iteration. Maybe the analysis is confused because, due to being in a loop, m has the same name on each iteration. But it also fails if we unroll:

void bad_unroll() {
    std::vector<std::mutex *> mutexes{new std::mutex(), new std::mutex()};

    mutexes[0]->lock(); // note: mutex acquired here
    mutexes[1]->lock(); // error: acquiring mutex 'operator[](mutexes, 1)' that is already held [-Werror,-Wthread-safety-analysis]

    // This is where we do stuff that's only safe when _all_ locks are locked

    mutexes[0]->unlock(); // note: mutex released here
    mutexes[1]->unlock(); // error: releasing mutex 'operator[](mutexes, 1)' that was not held [-Werror,-Wthread-safety-analysis]
}

This could be a valid warning if thread safety analysis is conservatively assuming that mutexes[0] and mutexes[1] might point to the same thing, but since alias analysis is unsupported, I don't think that's it.

This might be something that's intentionally unsupported by thread safety analysis, but I didn't see any documentation for it being a known failure.

Live demo: https://godbolt.org/z/1sn5TMj3h

rupprecht commented 2 years ago

Forgot to add, here's a real piece of code that references this issue: https://github.com/llvm/llvm-project/blob/8d89dbceeb576171efd12a5657c038a2ec2e54a5/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp#L227

aaronpuchert commented 2 years ago

The loops are pretty much out of scope I'd say. The analysis collects "symbolic lock expressions", so we just note *m as locked and basically assume that m doesn't change while we hold the lock. (If you were to lock *m, then point to another lock and unlock, we won't complain.) That's kind of what "no alias analysis" is saying: we don't track actual locks, only symbolic expressions. We have no mechanism to track changes to variables used in these lock expressions, nor a concept of having two lock expressions point to the same underlying object.

The bad_unroll example looks like a bug however: apparently we're not able to distinguish the expressions mutexes[0] and mutexes[1]. The expression comparison might be ignoring the additional argument. In any event, if we were to distinguish these expressions, we might produce new warnings elsewhere, so we need to tread carefully. @AaronBallman, @delesley: do you remember whether having the comparison ignore the index was deliberate?

AaronBallman commented 2 years ago

The loops are pretty much out of scope I'd say. The analysis collects "symbolic lock expressions", so we just note *m as locked and basically assume that m doesn't change while we hold the lock. (If you were to lock *m, then point to another lock and unlock, we won't complain.) That's kind of what "no alias analysis" is saying: we don't track actual locks, only symbolic expressions. We have no mechanism to track changes to variables used in these lock expressions, nor a concept of having two lock expressions point to the same underlying object.

+1 -- adding alias analysis to this would be a pretty significant amount of effort, and in some ways, would encourage people to write less clear code because we're more relaxed about what we track (thinking about locks through aliased objects seems fraught with cognitive danger unless extreme caution is used).

The bad_unroll example looks like a bug however: apparently we're not able to distinguish the expressions mutexes[0] and mutexes[1]. The expression comparison might be ignoring the additional argument. In any event, if we were to distinguish these expressions, we might produce new warnings elsewhere, so we need to tread carefully. @AaronBallman, @delesley: do you remember whether having the comparison ignore the index was deliberate?

I don't recall it being explicit. We added support for array extents in the attribute expression arguments (https://github.com/llvm/llvm-project/blob/891eb20104b6b77c7ff85f532bbce2a28ee9eeb0/clang/lib/Analysis/ThreadSafetyCommon.cpp#L633), but I think we simply missed it in the access check code (https://github.com/llvm/llvm-project/blob/891eb20104b6b77c7ff85f532bbce2a28ee9eeb0/clang/lib/Analysis/ThreadSafety.cpp#L1703).

llvm / llvm-project

False positive in -Wthread-safety-analysis when using mutexes in collections #58535