Open zxyzxy1111 opened 3 weeks ago
@llvm/issue-subscribers-clang-static-analyzer
Author: zuoxueying (zxyzxy1111)
Thanks for reporting this issue!
I replaced the clang-tidy label with "clang:static analyzer" because the clang-tidy check clang-analyzer-cplusplus-InnerPointer is a wrapper around a feature of the clang static analyzer.
I'm pretty sure that in the example code
const char *single_quote(const char *input) {
std::string s = std::string("");
return s.append("'").append(input).append("'").c_str();
}
the analyzer fails to produce a warning because it does not "know" that std::string::append
always returns *this
.
In general, when the clang static analyzer encounters a function call (where C++ methods are also included in "function"s), three things may happen:
evalCall
callbacks) and update the State
to reflect the effects of the function call.In your example code the method std::string::append()
is not recognized and not inlined, so it will be handled by a conservative evaluation and its result will be a completely unknown std::string
lvalue (which is represented by the SymbolicRegion
that you observed).
The analyzer does not report that the c_str()
of this unknown std::string
object is returned from single_quote()
, because it cannot prove that this is an error: if the object is unknown, then its lifetime might be long enough to survive after single_quote()
.
To fix this deficiency, we'd need to introduce an evalCall
callback which recognizes std::basic_string::append
and updates the analyzer state to model that this method always returns *this
.
This is a fairly straightforward task, but we'd probably need to think about technical questions like "which checker should handle this callback?" (In theory we have StringChecker.cpp
which claims that it handles the modeling of C++ std::string
methods, but I'm not familiar with it and it only contains code for modeling some constructor calls, so it's definitely unfinished.)
I (or somebody else) will probably start working on this soon.
I checked StringChecker.cpp
and despite its very general name and description, it's a trivial stub that only checks one very particular precondition (that the char*
passed to the constructor of std::string
may not be a nullpointer).
@haoNoQ @steakhal @Xazax-hun (or anybody else with an opinion): What do you think, what would be a good place for evalCall
callbacks for std::string::append()
(and similar string methods)? Should I just drop them in InnerPointerChecker
?
Thank you for your detailed response. Recently, through my analysis, I found that there are several missed reports in innerpointerchecker.cpp, such as:
Example 1:
MachineStorage is a local variable, defined inside the code block of if (const MDNode *N = dyn_cast
Example 2:
Line number: 204 Loop jump back: Jumping back to the beginning of the loop.
Line number: 205 Loop start: Jumped back to the beginning of the loop.
Line number: 185 Condition check: Condition "it != e", taking true branch.
Line number: 185 Condition check: Condition "currentAdaptor", taking false branch.
Line number: 187 Else branch: Reached else branch.
Line number: 198 Main error: Using invalidated internal representation of "it". (In the else branch, the code attempts to use the invalidated it, which leads to a use after free error.)
passes is a container of type std::vector<std::unique_ptr
I think this issue leading to a dangling pointer should be a WRAPPER_ESCAPE problem, but the analyzer didn't report it. Do you think there is a problem with my analysis? If there is no problem, where should we make modifications?
@NagyDonat
Thank you once again for your thorough and patient response. Your method has proven to be very effective. Below is the eval function I added. Could you please take a look and see if it is correct?
After adding an eval function, the checker can correctly report errors. However, After my analysis,maybe there are still many other dangling pointers that cause the checker to miss some errors, leading to false negatives. Maybe we need to thoroughly resolve this 。 I look forward to your feedback and suggestions. @NagyDonat
class TestClass {
public:
TestClass(unsigned int width, unsigned int height)
: _width(width), _height(height) {
_rowPtrs.resize(1, std::vector<const char*>(height, nullptr));
}
void process() {
int tmpHalfBufferElements = _width * _height;
std::vector<unsigned short> tmpHalfBuffer(tmpHalfBufferElements);
char* tmpHalfBufferPtr = (char*)&tmpHalfBuffer[0];
for (int y = 0; y < _height; ++y) {
_rowPtrs[0][y] = (const char*)tmpHalfBufferPtr;
tmpHalfBufferPtr += _width * sizeof(unsigned short);
}
}
void printRowPtrs() const {
for (const auto& ptr : _rowPtrs[0]) {
std::cout << (void*)ptr << " ";
}
std::cout << std::endl;
}
private:
unsigned int _width;
unsigned int _height;
std::vector<std::vector<const char*>> _rowPtrs;
};
int main() {
unsigned int width = 2;
unsigned int height = 2;
TestClass test(width, height);
test.process();
test.printRowPtrs();
std::cout << "After process function exits:" << std::endl;
test.printRowPtrs();
return 0;
}
ANALYZED BY GDB
Breakpoint 1, TestClass::process (this=0x7fffffffcc00) at test1-o.cpp:12
12 int tmpHalfBufferElements = _width * _height;
(gdb) next
13 std::vector<unsigned short> tmpHalfBuffer(tmpHalfBufferElements);
(gdb) next
15 char* tmpHalfBufferPtr = (char*)&tmpHalfBuffer[0];
(gdb) n
17 for (int y = 0; y < _height; ++y) {
(gdb) n
18 _rowPtrs[0][y] = (const char*)tmpHalfBufferPtr;
(gdb) n
19 tmpHalfBufferPtr += _width * sizeof(unsigned short);
(gdb) n
17 for (int y = 0; y < _height; ++y) {
(gdb) n
18 _rowPtrs[0][y] = (const char*)tmpHalfBufferPtr;
(gdb) n
19 tmpHalfBufferPtr += _width * sizeof(unsigned short);
(gdb) n
17 for (int y = 0; y < _height; ++y) {
(gdb) n
13 std::vector<unsigned short> tmpHalfBuffer(tmpHalfBufferElements);
(gdb) n
21 }
(gdb) n
main () at test1-o.cpp:42
42 test.printRowPtrs();
(gdb) n
0x41beb0 0x41beb4
45 std::cout << "After process function exits:" << std::endl;
(gdb) n
After process function exits:
46 test.printRowPtrs();
(gdb) n
0x41beb0 0x41beb4
48 return 0;
(gdb) n
40 TestClass test(width, height);
(gdb) n
49 }
(gdb) n
0x00007ffff74bdd85 in __libc_start_main () from /lib64/libc.so.6
(gdb) n
Single stepping until exit from function __libc_start_main,
which has no line number information.
From the debugging session, it can be seen that _rowPtrs is correctly set to the address of tmpHalfBuffer during the execution of the process function. However, after the process function exits, tmpHalfBuffer is destroyed, causing _rowPtrs to still reference the memory area that has been destroyed.
In the main function, the first time printRowPtrs is called, the content of _rowPtrs is a valid address (for example, 0x41beb0 and 0x41beb4). But after the process function exits, tmpHalfBuffer is destroyed, causing _rowPtrs to still reference the memory area that has been destroyed. However, the WRAPPER_ESCAPE issue is not reported.
This InnerPointerChecker::evalCall
looks good at first glance, if you create a pull request containing it, we would be happy to review and then merge it. Patches that improve the analyzer are always [1] welcome :smile:
[1] if they follow the coding standards and don't introduce too many false positives
After my analysis,maybe there are still many other dangling pointers that cause the checker to miss some errors, leading to false negatives. Maybe we need to thoroughly resolve this.
The static analyzer does not try to report all errors, because it will never have a complete understanding of the analyzed source files (and there are simply too many language elements, library functions, design patterns, bug types etc.). Instead, we have two more modest goals:
(Disclaimer: this is just my personal impression, not an official goal statement.)
This means that from our point of view, false positives are bugs in the analyzer, but false negatives are only equivalent to feature requests -- it is nice to turn a false negative into a true positive, but we don't need to do it. There are lots of false negatives, so we are only working on those that are easy to resolve and/or especially common and important for the users.
Although the false negatives presented in this ticket are all connected to InnerPointerChecker
, it seems that they all have different root causes, so each of them would require individual solutions and there is no reason to handle them as a single project. (The checker which creates the bug report is just the final step after lots of modeling steps where the analyzer engine and the various checkers try to gather and update information about the execution state. Many false positives or negatives are caused by limitations of those earlier steps.)
@haoNoQ @steakhal @Xazax-hun (or anybody else with an opinion): What do you think, what would be a good place for
evalCall
callbacks forstd::string::append()
(and similar string methods)? Should I just drop them inInnerPointerChecker
?
To me, it would make sense to have the evalcall for append
in the StringChecker
- at least that was my idea at the time I introduced it.
Question: clang-tidy issue: It is known that the following code snippet will cause a WRAPPER_ESCAPE problem.
So: The problem may be that the PtrSet returned by State->get(MR) is empty, or that MR is not correctly mapped to RawPtrMap. Further check the state of RawPtrMap and the value of MR.
Check the contents of RawPtrMap: Add debugging information in markPtrSymbolsReleased to print the contents of RawPtrMap and ensure that MR is correctly mapped. Check the value of MR: Print the value of MR in checkPostCall and markPtrSymbolsReleased to ensure that they are consistent.
Add debugging information in markPtrSymbolsReleased: Print the contents of RawPtrMap and the value of MR. void InnerPointerChecker::markPtrSymbolsReleased(const CallEvent &Call, ProgramStateRef State, const MemRegion *MR, CheckerContext &C) const { std::cout << "markPtrSymbolsReleased called with MR: " << MR << "\n"; const RawPtrMapTy &RPM = State->get();
for (const auto &Entry : RPM) {
std::cout << "RawPtrMap entry: " << Entry.first << "\n";
}
if (const PtrSet PS = State->get(MR)) {
const Expr Origin = Call.getOriginExpr();
for (const auto Symbol : *PS) {
std::cout << "Marking symbol as released: " << Symbol << "\n";
State = allocation_state::markReleased(State, Symbol, Origin);
}
State = State->remove(MR);
C.addTransition(State);
return;
} else {
std::cout << "No PtrSet found for MR: " << MR << "\n";
}
}
Add debugging information in checkPostCall: Print the value of ObjRegion. void InnerPointerChecker::checkPostCall(const CallEvent &Call, CheckerContext &C) const { ProgramStateRef State = C.getState();
const TypedValueRegion *ObjRegion = nullptr;
if (const auto *ICall = dyn_cast(&Call)) {
ObjRegion = dyn_cast_or_null(
ICall->getCXXThisVal().getAsRegion());
if (isInvalidatingMemberFunction(Call)) { std::cout << "call markPtrSymbolsReleased with ObjRegion: " << ObjRegion << "\n"; markPtrSymbolsReleased(Call, State, ObjRegion, C); return; } }
if (isInnerPointerAccessFunction(Call)) { if (isa(Call)) {
ObjRegion =
dyn_cast_or_null(Call.getArgSVal(0).getAsRegion());
}
if (!ObjRegion) return;
SVal RawPtr = Call.getReturnValue(); if (SymbolRef Sym = RawPtr.getAsSymbol(/IncludeBaseRegions=/true)) { PtrSet::Factory &F = State->getStateManager().get_context();
const PtrSet SetPtr = State->get(ObjRegion);
PtrSet Set = SetPtr ? SetPtr : F.getEmptySet();
assert(C.wasInlined || !Set.contains(Sym));
Set = F.add(Set, Sym);
State = State->set(ObjRegion, Set);
C.addTransition(State);
}
return; }
checkFunctionArguments(Call, State, C); }
Output call markPtrSymbolsReleased with ObjRegion: 0x7631a60 call markPtrSymbolsReleased markPtrSymbolsReleased called with MR: 0x7631a60 No PtrSet found for MR: 0x7631a60 call markPtrSymbolsReleased with ObjRegion: 0x7630cb8 call markPtrSymbolsReleased markPtrSymbolsReleased called with MR: 0x7630cb8 No PtrSet found for MR: 0x7630cb8 call markPtrSymbolsReleased with ObjRegion: 0 call markPtrSymbolsReleased markPtrSymbolsReleased called with MR: 0 No PtrSet found for MR: 0 call markPtrSymbolsReleased with ObjRegion: 0 call mark markPtrSymbolsReleased
markPtrSymbolsReleased function was correctly called, but the PtrSet returned by State->get(MR) is always empty. This indicates that MR has not been correctly added to RawPtrMap.
Further debugging found that in the checkPostCall function, ObjRegion was not correctly added to RawPtrMap. In SymbolRef Sym = RawPtr.getAsSymbol(/IncludeBaseRegions=/true) returns false, which means that RawPtr is not a symbol. Further check the type and source of RawPtr to determine why it is not a symbol.
The problem may be in if (isInnerPointerAccessFunction(Call)) { std::cout << "isInnerPointerAccessFunction(Call) = true" << "\n"; // It has been determined that this is running if (isa(Call)) {
std::cout << "isa(Call) = true" << "\n"; // It has been determined that this is not running
ObjRegion =
dyn_cast_or_null(Call.getArgSVal(0).getAsRegion());
} where if (isa(Call))=false
isa(Call) returns false, need to further check the type of Call to determine why it is not a SimpleFunctionCall type.
Output release-install/bin/wecheck test2.cpp -- -g markPtrSymbolsReleased called with MR: 0x7df7710 No PtrSet found for MR: 0x7df7710 markPtrSymbolsReleased called with MR: 0x7df6968 No PtrSet found for MR: 0x7df6968 markPtrSymbolsReleased called with MR: 0 No PtrSet found for MR: 0 markPtrSymbolsReleased called with MR: 0 No PtrSet found for MR: 0 isInnerPointerAccessFunction(Call) = true isa<CXXMemberCall>(Call) = true
Conclusion Call object does not have a getCXXThisVal method. The value of this pointer needs to be obtained from the CXXMemberCall object.
// Check the specific type of Call if (llvm::isa<SimpleFunctionCall>(Call)) { std::cout << "isa<SimpleFunctionCall>(Call) = true" << "\n"; // It has been determined that this is not running ObjRegion = dyn_cast_or_null<TypedValueRegion>(Call.getArgSVal(0).getAsRegion()); } else if (const auto *MemberCall = llvm::dyn_cast<CXXMemberCall>(&Call)) { std::cout << "isa<CXXMemberCall>(Call) = true" << "\n"; ObjRegion = dyn_cast_or_null<TypedValueRegion>(MemberCall->getCXXThisVal().getAsRegion()); // It has been determined that this is a CXXMemberCall, not a SimpleFunctionCall } else if (llvm::isa<CXXConstructorCall>(Call)) { std::cout << "isa<CXXConstructorCall>(Call) = true" << "\n"; } else if (llvm::isa<CXXDestructorCall>(Call)) { std::cout << "isa<CXXDestructorCall>(Call) = true" << "\n"; } else { std::cout << "Call is of unknown type" << "\n"; } if (!ObjRegion) return;
Output
isInnerPointerAccessFunction(Call) = true isa<CXXMemberCall>(Call) = true MemberCall->getCXXThisVal() = &SymRegion{conj_$13{basic_string<char> &, LC2, S1185017, #1}}ThisVal.getAsRegion() = 0x7e96d48
MemberCall->getCXXThisVal() returned a SymRegion, and ThisVal.getAsRegion() returned a valid memory region address 0x7e96d48. This indicates that the value returned by getCXXThisVal() is valid. However, dyn_cast_or_null(ThisRegion) may have failed because ThisRegion is not of TypedValueRegion type. Further check the actual type of ThisRegion and ensure that it can be correctly converted.
isInnerPointerAccessFunction(Call) = true isa(Call) = true
MemberCall->getCXXThisVal() = &SymRegion{conj_$13{basic_string &, LC2, S1185017, #1}}ThisVal.getAsRegion() = 0x80fe488
ThisRegion is a SymbolicRegion
ObjRegion is null
This Region is a SymbolicRegion, not a TypedValueRegion. This indicates that further processing of SymbolicRegion is needed to obtain the required TypedValueRegion.
In the checkPostCall method, when ThisRegion is a SymbolicRegion, obtain its BaseRegion and try to convert it to TypedValueRegion. After experimentation: error propagation - it has been determined that this is a SymbolicRegion, not a TypedValueRegion, and it is not possible to obtain TypedValueRegion from it. @NagyDonat