llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.83k stars 11.46k forks source link

Thread safety analysis treats non-const method calls as reads #97550

Open ilya-biryukov opened 2 months ago

ilya-biryukov commented 2 months ago

The thread safety analysis can distinguish between reads and writes to variables of simple types, but falls short when the write happens through a method call (see https://gcc.godbolt.org/z/8dffM6j1e and the code below). It seems to treat all method calls as reads and never as writes. This is a conservative approach that avoids false positives. However, it would be useful to have a mode that treat all non-const method calls as writes and const method calls as reads (similarly, passing an object through a const reference is a read and through a reference as a write?). E.g. this would allow to catch things like vector.push_back() made under a read lock.

Has this been discussed before? Should we consider adding a new warning flag to enable this class of warnings? (basically treats capture-by-reference as a write and capture-by-const-reference as a read)

struct __attribute__((lockable)) mutex {};

struct __attribute__((scoped_lockable)) lock_guard {
    lock_guard(mutex& mu) __attribute__((exclusive_lock_function(mu)));
    ~lock_guard() __attribute__((unlock_function));
};

struct __attribute__((scoped_lockable)) shared_lock_guard {
    shared_lock_guard(mutex& mu) __attribute__((shared_lock_function(mu)));
    ~shared_lock_guard() __attribute__((unlock_function));
};

// Wraps an int.
struct value {
    int get() const;
    void set(int);
};

struct Test {
    mutable mutex mu;
    value val __attribute__((guarded_by(mu)));
    int num __attribute__((guarded_by(mu)));

    void accidental_write_under_shared_lock() {
        shared_lock_guard l(mu);
        num = 10; // ok, warning.
        val.set(10); // no warning, but it would be nice to have one.
    }
};
llvmbot commented 2 months ago

@llvm/issue-subscribers-clang-frontend

Author: Ilya Biryukov (ilya-biryukov)

The thread safety analysis can distinguish between reads and writes to variables of simple types, but falls short when the write happens through a method call (see https://gcc.godbolt.org/z/8dffM6j1e and the code below). It seems to treat all method calls as reads and never as writes. This is a conservative approach that avoids false positives. However, it would be useful to have a mode that treat all non-const method calls as writes and const method calls as reads (similarly, passing an object through a const reference is a read and through a reference as a write?). E.g. this would allow to catch things like `vector.push_back()` made under a read lock. Has this been discussed before? Should we consider adding a new warning flag to enable this class of warnings? (basically treats capture-by-reference as a write and capture-by-const-reference as a read) ```cpp struct __attribute__((lockable)) mutex {}; struct __attribute__((scoped_lockable)) lock_guard { lock_guard(mutex& mu) __attribute__((exclusive_lock_function(mu))); ~lock_guard() __attribute__((unlock_function)); }; struct __attribute__((scoped_lockable)) shared_lock_guard { shared_lock_guard(mutex& mu) __attribute__((shared_lock_function(mu))); ~shared_lock_guard() __attribute__((unlock_function)); }; // Wraps an int. struct value { int get() const; void set(int); }; struct Test { mutable mutex mu; value val __attribute__((guarded_by(mu))); int num __attribute__((guarded_by(mu))); void accidental_write_under_shared_lock() { shared_lock_guard l(mu); num = 10; // ok, warning. val.set(10); // no warning, but it would be nice to have one. } }; ```
ilya-biryukov commented 2 months ago

CC @aaronpuchert and @delesley, who were contributing to the thread safety analysis before

aaronpuchert commented 2 months ago

Has this been discussed before?

Yes, see https://reviews.llvm.org/D52395.

aaronpuchert commented 2 months ago

Wait, that patch explicitly excludes member functions. They are a bit trickier, because of things like operator[] on a vector. Still possible (just write std::as_const(v)[n]), but it would create an even larger fallout.

ilya-biryukov commented 2 months ago

They are a bit trickier, because of things like operator[] on a vector

Ah, right, any member that just returns an interrior mutable pointer/reference into the class does not necessarily change the returned reference, but it all depends on the use. It'd obviously nice to find some way to avoid std::as_const(v)[n] and generally all the various methods that have a const-overload that was not picked because the object was not const.

Are those cases similar to how we would handle PT_GUARDED_BY, but as if this itself would have this annotation? I feel that very similar logic could be applicable for those cases that return references.\ Possibly requiring some annotation on those interesting methods and some heuristics to override them. One idea is to treat methods returning any reference as the ones that return an interrior, but only if there is a method in the overload set which is also const. So vector::operator [] would be treated as returning an interrior pointer, but not mutating, but map::operator [] would be treated as mutating. Methods that don't return references would be treated as mutating if they're non-const.

(I am happy to write down a more formal and detailed proposal if the above sounds confusing, but wanted to gauge the interest for this and whether people think it would go anywhere)

PS you are right for vector, but it's interesting that the corresponding analysis for types like unordered_map is actually exactly what's needed: int a = map[123] under read lock is an error.

johngro commented 2 months ago

Hey there, I just ran into this behavior yesterday and found it to be a bit surprising, however after reading the comments here, as well as those left in the review referenced by @aaronpuchert, it has become clear why this is a non-trivial issue.

Given the complexity, I'm not sure I'd want to attempt to come up with universal solution which just does "the right thing" all of the time, but I would still really like to be able to use the analyzer to catch mistakes in simple patterns, like the basic setter pattern that @ilya-biryukov showed at the start of this issue. (fun fact, I ended up writing up my own version of basically the same thing before finding this thread; https://godbolt.org/z/3v4EKj744)

So, I was wondering: Would it be unreasonable to have an annotation I could apply on a method by method basis that would override this behavior? Right now the default analysis behavior seems to assume "const method will not mutate the object, but non-const might mutate the object, and since we don't know, we assume it will not mutate the object in order to avoid a deluge of warnings".

If there was an annotation which said "assume that this method mutates" (perhaps __attribute__((mutates_object())), kind of the opposite of const), then I could at least annotate the methods that I knew should have this treatment (like basic setters).

In a world like this, a top level flag could be passed that implicitly applied the attribute to all methods for those who would be willing to tolerate the consequences, but I think that my use cases would be satisfied with just an attribute that I could use to opt-in to the behavior.

delesley commented 1 month ago

johngro -- that's not a bad suggestion, it shouldn't break anything, and it wouldn't be hard to implement. Unfortunately, it would mean adding a huge number of "mutator" annotations throughout your code base to get any benefit, which is likely prohibitive for most users, especially since those annotations will not be on classes in the C++ standard library...

BTW, there's still a FIXME in the comment on handleCall() which discusses this issue... :-)

ilya-biryukov commented 1 month ago

I agree that annotating the function as "mutating" is definitely a feature we could have and it doesn't hurt anyone if we have it (except for the added complexity to the code, but my intuition here is that it should be minimal).

However, +1 to @delesley's comments about the scalability of this approach. The reason we would like to have something that "works by default" is to cover big chunks of the codebase by annotating the mutex objects and without relying on people to write those annotations. We would rather go with opt-out annotation rather than opt-in.

aaronpuchert commented 2 weeks ago

We could also try enabling stricter checks under a separate warning flag. We already have -Wthread-safety-reference, and this might be -Wthread-safety-mutable-reference. The change itself shouldn't be hard, but someone should evaluate it on a real-world code base to see how many warnings they get and whether it's feasible to fix them.

This would be my favorite—if it works.

You could already try this with my old patch (assuming it still applies cleanly). This only covers function arguments, which should be a bit easier. I will also try it on our code. (Back when I posted the change we were still playing around with thread safety analysis, while now we're using it more widely.) Then we can also try adding checks for member functions.