llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.87k forks source link

ExprMutationAnalyzer mishandles templated callback lambdas #38302

Open LebedevRI opened 6 years ago

LebedevRI commented 6 years ago
Bugzilla Link 38954
Version trunk
OS Linux
Blocks llvm/llvm-project#38238
CC @JonasToth

Extended Description

template void withCallback(Lambda l) { for(int i = 0; i < 4; i++) { l(i); } } void test() { auto l = [](int i){}; withCallback(l); }

warning: variable is mutated in the loop body and loop iteration expression [bugprone-loop-variable-mutations] for(int i = 0; i < 4; i++) { ^ note: loop in question for(int i = 0; i < 4; i++) { ^ note: mutation in loop body occurs here l(i); ^ note: mutation in loop iteration expression occurs here for(int i = 0; i < 4; i++) { ^

LebedevRI commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#38955

JonasToth commented 6 years ago

I get your point, in fact I have the exact same thought when looking at your use case :) Indeed all previous use case required leaning towards "assume mutation when unsure" and your check wants the other way around.

Yup, I guess that was our starting point and on that side we are pretty far already!

On the bright side I think we're actually not far away, the only major area this "assume mutation when unsure" behavior happens is around pointers, I'm reasonable sure once we get that fixed there would be no "maybe" results (except template.)

I would like to add to the template problems: Once there are concepts in place it is in principle possible to analyze type and value-dependent expression under their concept constraints.

For value-dependent expressions: Given the values are int-like the analysis should be possible. But in my opinion its ok to ignore these for now, because in the template instantiations the value-dependendness is gone.

Of course I could be wrong, so I'm very eager to see how EMA performs once findPointeeMutation is implemented in order to verify. Basically I'm not ruling out the option of adding a "maybe" result but I'd like to see how far we can push forward without it.

Totally agree. Given the static analyzer guys were interested in using it as well, we will probably find all false positives/negatives and fix them. The question 'is this thing mutated' is binary, because the compiler knows when you are mutating a 'const' variable. So the analyzer should figure this out as well (- type-dependent without concepts as nothing is known there)

llvmbot commented 6 years ago

I'm hesitating on "I'm not sure" result because I think the only place where "I'm not sure" is correct is {type,value}-dependent (aka template), all other "I'm not sure" should be fixed.

I agree. But how soon will that happen, how soon all of them will be fixed? Explanation:

Right now, when EMA can not analyze something, it pessimistically assumes that the mutation happens. It may be the correct decision/answer, given the right question. E.g. for the Jonas's const-correctness check, where the question is, "do we know this doesn't mutate?" But if the question is the opposite one, i.e. "do we know this does mutate?",

well, now you are mixing together both the actual mutations, and the EMA analysis limitations.

Err, what i meant is, the pessimistic answer "i don't know, i guess this mutates" always mixes together both the actual mutations, and the EMA analysis limitations.

If we are looking for "no mutation happens", then we will only get it when there was no analysis limitations, and the analysis concluded that it does not mutate.

While if we are looking for "mutation does happen", we will get it both when the analysis concluded that the mutation actually happens, and when the analysis limitation was hit.

I get your point, in fact I have the exact same thought when looking at your use case :) Indeed all previous use case required leaning towards "assume mutation when unsure" and your check wants the other way around. On the bright side I think we're actually not far away, the only major area this "assume mutation when unsure" behavior happens is around pointers, I'm reasonable sure once we get that fixed there would be no "maybe" results (except template.) Of course I could be wrong, so I'm very eager to see how EMA performs once findPointeeMutation is implemented in order to verify. Basically I'm not ruling out the option of adding a "maybe" result but I'd like to see how far we can push forward without it.

LebedevRI commented 6 years ago

I'm hesitating on "I'm not sure" result because I think the only place where "I'm not sure" is correct is {type,value}-dependent (aka template), all other "I'm not sure" should be fixed.

I agree. But how soon will that happen, how soon all of them will be fixed? Explanation:

Right now, when EMA can not analyze something, it pessimistically assumes that the mutation happens. It may be the correct decision/answer, given the right question. E.g. for the Jonas's const-correctness check, where the question is, "do we know this doesn't mutate?" But if the question is the opposite one, i.e. "do we know this does mutate?",

well, now you are mixing together both the actual mutations, and the EMA analysis limitations.

Err, what i meant is, the pessimistic answer "i don't know, i guess this mutates" always mixes together both the actual mutations, and the EMA analysis limitations.

If we are looking for "no mutation happens", then we will only get it when there was no analysis limitations, and the analysis concluded that it does not mutate.

While if we are looking for "mutation does happen", we will get it both when the analysis concluded that the mutation actually happens, and when the analysis limitation was hit.

LebedevRI commented 6 years ago

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Oh wait, i think i'm looking at my own testcase wrong. For the uninstantiated withCallback() itself, i think it should return "i don't know".

The deeper problem here, is that EMA does not provide any context/explanation of results. It would be great if it could record the logic that lead to the decision that mutation happens, and be able to dump it.

How much & what kind of information do you need? (Right now EMA gives out a Stmt which mutates the given Expr, and in the case of indirect mutation via a series of reference binding (e.g. int x; int& r = x; r = 10;) findMutation(x) returns DeclRefExpr(r) and findMutation(DeclRefExpr(r)) returns "r = 10" so that you can follow the chain of reference binding.) I was just thinking out loud. I don't have any concrete ideas/needs.

I'm hesitating on "I'm not sure" result because I think the only place where "I'm not sure" is correct is {type,value}-dependent (aka template), all other "I'm not sure" should be fixed.

I agree. But how soon will that happen, how soon all of them will be fixed? Explanation:

Right now, when EMA can not analyze something, it pessimistically assumes that the mutation happens. It may be the correct decision/answer, given the right question. E.g. for the Jonas's const-correctness check, where the question is, "do we know this doesn't mutate?" But if the question is the opposite one, i.e. "do we know this does mutate?", well, now you are mixing together both the actual mutations, and the EMA analysis limitations.

llvmbot commented 6 years ago

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Oh wait, i think i'm looking at my own testcase wrong. For the uninstantiated withCallback() itself, i think it should return "i don't know".

The deeper problem here, is that EMA does not provide any context/explanation of results. It would be great if it could record the logic that lead to the decision that mutation happens, and be able to dump it.

How much & what kind of information do you need? (Right now EMA gives out a Stmt which mutates the given Expr, and in the case of indirect mutation via a series of reference binding (e.g. int x; int& r = x; r = 10;) findMutation(x) returns DeclRefExpr(r) and findMutation(DeclRefExpr(r)) returns "r = 10" so that you can follow the chain of reference binding.) I'm hesitating on "I'm not sure" result because I think the only place where "I'm not sure" is correct is {type,value}-dependent (aka template), all other "I'm not sure" should be fixed.

LebedevRI commented 6 years ago

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Oh wait, i think i'm looking at my own testcase wrong. For the uninstantiated withCallback() itself, i think it should return "i don't know".

The deeper problem here, is that EMA does not provide any context/explanation of results. It would be great if it could record the logic that lead to the decision that mutation happens, and be able to dump it.

llvmbot commented 6 years ago

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates. Well you can't really issue any warning and/or fixit to any instantiations right? Instantiations are implicitly generated by the compiler and not written in code, what's written in code is always uninstantiated. The fact that there's one instantiation of withCallback caused by test doesn't make the raw definition of withCallback instantiated and any warnings issues on that piece of code, which is uninstantiated, based on any particular instantiation would be wrong.

Take this case as a concrete example: it just so happens that the instantiated version in test has a non-mutating Lambda. withCallback can be also called with auto l2 = [](int& i) { i = 123; }; withCallback(l2);

So we can't reliably tell whether the expression l(i) mutates i or not by just looking at the template definition. BTW, would it be possible for the EMA to be more descriptive? It would be great if it would not simply say "i give up, i guess this mutates", but instead say "i give up, this may or may not be mutating.". I'll think about it, "maybe" would probably equal to "findMutation() returns a {type,value}-dependent expression", if that's indeed the case then the caller of EMA could handle that trivially.

Thinking deeper about this, in order to be extra smart about uninstantiated template, we can look at not only the template definition itself but also all of its instantiated version. So that immediately rules out templates in header files because we can't see all of its instantiations within a single TU. How about template defined in .cpp then? At least we can iterate through all the instantiations, but the difficulty is that there doesn't seem to be a way to create a mapping between different instantiated versions of the same expression. For example: template void g(F f) { int x; f(x); } void h() { g( {}); g( {}); } We'll have three FunctionDecl of g, first one being the template definition itself, and the rest two being the two instantiated versions in h. Each FunctionDecl has its own CompoundStmt as function body and its own DeclRefExpr to x within the body. And no trace has left to map between these DeclRefExpr. Which means even if we figured out all instantiated version of x are not mutated, we can't map that information back to the uninstantiated version of x.

I'm probably lacking a lot of context, but i really do think we want the per-instantiation knowledge about mutations, not for the whole templated function. You can get per-instantiation results already, simply pass the CompoundStmt of the instantiated version when constructing EMA. But how would you issue warnings/suggestions? Diags are issues to a particular source code location and that piece of source code is the uninstantiated version.

FYI, for the above example involving g and h, here's what the AST looks like: FunctionTemplateDecl | |-TemplateTypeParmDecl | |-FunctionDecl <------ This is the uninstantiated version | | |-ParmVarDecl | | -CompoundStmt <---- If passing this as "scope" when constructing EMA, EMA will assume CallExpr mutates | | |-DeclStmt | | |-VarDecl | | -CallExpr | | |-DeclRefExpr | |-DeclRefExpr | |-FunctionDecl <-------- This is the first instantiation | | |-TemplateArgument | | |-ParmVarDecl | | -CompoundStmt <------ Passing this to EMA will get "not mutated" result, because first instantiation has a non-mutating lambda | | |-DeclStmt | | |-VarDecl | | -CXXOperatorCallExpr | | |-ImplicitCastExpr | | |-DeclRefExpr | | |-ImplicitCastExpr | | | -DeclRefExpr | |-ImplicitCastExpr | | -DeclRefExpr |-FunctionDecl <-------- This is the second instantiation | |-TemplateArgument | |-ParmVarDecl | -CompoundStmt <------ Passing this the EMA will get "mutated" result. | |-DeclStmt | |-VarDecl | -CXXOperatorCallExpr | |-ImplicitCastExpr | |-DeclRefExpr | |-ImplicitCastExpr | | -DeclRefExpr |-DeclRefExpr

LebedevRI commented 6 years ago

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

  • In template (and macro), something can be replaced and completely change the behavior, in other words there's no single stable behavior to analyze.
  • ExprMutationAnalyzer has to lean on safer side and pretty much anything {type,value}-dependent results in a mutation. I'm not following, this bugreport isn't about uninstantiated templates, it is about very much instantiated templates. So far i haven't seen any false-positives due to the uninstantiated templates.

Take this case as a concrete example: it just so happens that the instantiated version in test has a non-mutating Lambda. withCallback can be also called with auto l2 = [](int& i) { i = 123; }; withCallback(l2);

So we can't reliably tell whether the expression l(i) mutates i or not by just looking at the template definition. BTW, would it be possible for the EMA to be more descriptive? It would be great if it would not simply say "i give up, i guess this mutates", but instead say "i give up, this may or may not be mutating.".

Thinking deeper about this, in order to be extra smart about uninstantiated template, we can look at not only the template definition itself but also all of its instantiated version. So that immediately rules out templates in header files because we can't see all of its instantiations within a single TU. How about template defined in .cpp then? At least we can iterate through all the instantiations, but the difficulty is that there doesn't seem to be a way to create a mapping between different instantiated versions of the same expression. For example: template void g(F f) { int x; f(x); } void h() { g( {}); g( {}); } We'll have three FunctionDecl of g, first one being the template definition itself, and the rest two being the two instantiated versions in h. Each FunctionDecl has its own CompoundStmt as function body and its own DeclRefExpr to x within the body. And no trace has left to map between these DeclRefExpr. Which means even if we figured out all instantiated version of x are not mutated, we can't map that information back to the uninstantiated version of x.

I'm probably lacking a lot of context, but i really do think we want the per-instantiation knowledge about mutations, not for the whole templated function.

llvmbot commented 6 years ago

I think the clangtidy check should skip uninstantiated template (and macros fwiw), because:

Take this case as a concrete example: it just so happens that the instantiated version in test has a non-mutating Lambda. withCallback can be also called with auto l2 = [](int& i) { i = 123; }; withCallback(l2); So we can't reliably tell whether the expression l(i) mutates i or not by just looking at the template definition.

Thinking deeper about this, in order to be extra smart about uninstantiated template, we can look at not only the template definition itself but also all of its instantiated version. So that immediately rules out templates in header files because we can't see all of its instantiations within a single TU. How about template defined in .cpp then? At least we can iterate through all the instantiations, but the difficulty is that there doesn't seem to be a way to create a mapping between different instantiated versions of the same expression. For example: template void g(F f) { int x; f(x); } void h() { g( {}); g( {}); } We'll have three FunctionDecl of g, first one being the template definition itself, and the rest two being the two instantiated versions in h. Each FunctionDecl has its own CompoundStmt as function body and its own DeclRefExpr to x within the body. And no trace has left to map between these DeclRefExpr. Which means even if we figured out all instantiated version of x are not mutated, we can't map that information back to the uninstantiated version of x.

LebedevRI commented 6 years ago

assigned to @devincoughlin