llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.46k stars 11.77k forks source link

setjmp()/longjmp() control-flow #5677

Open llvmbot opened 14 years ago

llvmbot commented 14 years ago
Bugzilla Link 5305
Version 2.6
OS Linux
Attachments Sample code.
Reporter LLVM Bugzilla Contributor
CC @belkadan,@tkremenek,@xuzhongxing

Extended Description

In code like:

1: int main(void) 2: { 3: char *a; 4: jmp_buf jb; 5: 6: if(setjmp(jb)) { 7: free(a); 8: return(0); A: } B: C: a=malloc(1); D: longjmp(jb,1); E: }

The static analyzer complains both of a dead assignment to "a" (line C) and that "a" is undefined (line 7).

It would be nice if it could handle this. (in the cases where it can see both the setjmp() and longjmp(), or perhaps using annotations to declare functions as possibly longjmp():ing)

llvmbot commented 8 years ago

Maybe I'm missing something, but isn't the complaint basically legitimate (even if arrived at by accident..)? The usual caveat for setjmp/longjmp is that locals modified after setjmp returns 0 may or may not be restored by longjmp, so the 'free(a)' is not valid, and the assignment to 'a' can be considered dead, by lacking valid uses.

Well, the language standard says what happens to variables on a longjmp. However, the test case does have a problem, and its behaviour is undefined. The specific language rule that applies is this one (C11 7.13.2.1p3):

'all accessible objects have values ... as of the time the longjmp function was called ... except that the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and the longjmp call are indeterminate'.

The behaviour of the test case is undefined because 'a' meets all the criteria above, so its value is indeterminate after the longjmp, and it is subsequently evaluated.

But the bug reported is nevertheless valid. We can easily correct the test case by adding a volatile qualifier to the declaration of a:

char * volatile a;

and the static analyzer still complains about the uninitialized value (seems that volatile suppresses dead store warnings, at least in the latest versions).

And I'm pretty sure it's generally unsafe to setjmp inside a {} containing local vars defs, then after exiting the block, call a function which longjmps back into the block

Unless the block contains declarations of identifiers with variably modified type, such jumps are generally allowed (although it is not allowed to longjmp into functions that have returned).

tkremenek commented 14 years ago

And I'm pretty sure it's generally unsafe to setjmp inside a {} containing local vars defs, then after exiting the block, call a function which longjmps back into the block -- but does that mean the compiler can assume that functions after the block won't longjmp?

Modeling setjmp/longjmp in the CFG will not be used for compiler correctness (of codegen), but it will impact analysis-based warnings in the compiler and the static analyzer.

Depending on how much information the analyzer has there are likely reasonable heuristics on when to emit a warning when one potentially does a longjmp into a destroyed scope. With inter-procedural analysis, we may be able to tell that a called function does a longjmp, and thus emit a warning for such cases. Without such information it is unlikely the analyzer could emit a warning with a high signal-to-noise ratio.

tkremenek commented 14 years ago

In general, however, longjmp is called in a different function; as you say it would be good to be able to annotate functions as possibly longjmping - it's probably more practical to identify functions that won't longjmp, since the compiler needs to assume a function can longjmp if there's no information to the contrary.

Modeling setjmp/longjmp in the CFG will also be useful for inter-procedural analysis, which is actively being worked on. From my perspective, modeling setjmp/longjmp is along the same lines of modeling exception edges in the CFG (which we optionally do for C++).

FWIW, I haven't seen many false positives related to not explicitly modeling setjmp/longjmp.

llvmbot commented 14 years ago

Maybe I'm missing something, but isn't the complaint basically legitimate (even if arrived at by accident..)? The usual caveat for setjmp/longjmp is that locals modified after setjmp returns 0 may or may not be restored by longjmp, so the 'free(a)' is not valid, and the assignment to 'a' can be considered dead, by lacking valid uses.

But of course, tracing the flow properly would allow better diagnostics, and detection of other cases; if you wrote char*a=NULL; then the code would still be improper but the warning about 'a' not being initialized would not be issued.

In general, however, longjmp is called in a different function; as you say it would be good to be able to annotate functions as possibly longjmping - it's probably more practical to identify functions that won't longjmp, since the compiler needs to assume a function can longjmp if there's no information to the contrary.

This case where (a) longjmp occurs in same function as setjmp; and (b) compiler could prove that the jmp_buf has not been modified by other calls in between; is easily constructed in an example, but doesn't seem to be a practical use of setjmp/longjmp. So, special treatment of this case doesn't seem justified IMO. There are practical cases, I guess, where longjmp would go to a setjmp in the same function, but longjmps to the same place from other (called) functions would also exist (otherwise, use a 'goto').

In practice, the answer to this may be to identify a convention for the use of local variables in functions containing setjmp, and produce warnings when it's not followed, e.g:

This is the approach I follow, it would be nice to have it checked. Maybe a bit messy for the general situation; it's unfortunate there's no explicit 'scope' for a setjmp, after which you know a longjmp can't occur any more - other than the entirety of the function of course. tagging non-longjmping functions would allow the actual scope to be narrowed.

And I'm pretty sure it's generally unsafe to setjmp inside a {} containing local vars defs, then after exiting the block, call a function which longjmps back into the block -- but does that mean the compiler can assume that functions after the block won't longjmp?

I find that setjmp/longjump, though widely shunned, can solve problems that are quite difficult to solve in other ways; but they require due care in application. It would be great if there were improvements in static analysis of their use, even if a few additional restrictions were needed to enable this.

tkremenek commented 14 years ago

Both warnings make use of Clang's source-level CFG, which has no special handling of setjmp/longjmp.

I should add that this is something we'd like to do in the CFG.

tkremenek commented 14 years ago

Both warnings make use of Clang's source-level CFG, which has no special handling of setjmp/longjmp.

llvmbot commented 14 years ago

assigned to @tkremenek