Open vogelsgesang opened 9 months ago
@llvm/issue-subscribers-clang-static-analyzer
Author: Adrian Vogelsgesang (vogelsgesang)
I see the same issue in std::sort
, also using clang-tidy 17.0.6:
#include <algorithm>
#include <string.h>
#include <string>
#include <vector>
bool custom(const char *lhs, const char *rhs) { return ::strcmp(lhs, rhs) < 0; }
static bool custom_sort(const std::string &lhs, const std::string &rhs) {
return custom(lhs.c_str(), rhs.c_str());
}
int main() {
std::vector<std::string> v;
std::sort(v.begin(), v.end(), custom_sort);
return 0;
}
sort.cc:9:30: warning: Method called on moved-from object of type 'std::basic_string' [clang-analyzer-cplusplus.Move]
9 | return custom(lhs.c_str(), rhs.c_str());
| ^
[...]
Confirmed https://godbolt.org/z/WxbrsE74o
We inline the stdlib implementation and then get lost inside, and we eventually end up calling the comparator on an item we already "moved" once.
I can see two solutions:
1) Simply disable inlining for std::sort
and std::ranges::sort
respectively. This would save signifficant time and also save us from these FPs.
2) Mark "length" and "size" as "isMoveSafeMethod" member functions inside the MoveChecker. This would only mitigate from these FPs; but not from more nuanced ones where the comporator would call other member functions.
I think both of these improvements would be great to have, but arguably option 1 would be ideal for getting rid of these "sort" FPs.
I'm seconding that disabling inlining for std::sort
and std::ranges::sort
is a reasonable solution.
Yes, this sounds perfectly reasonable to me as well. Intense aliasing causes a lot of problems of this kind for us and std::sort
is exactly that kind of nightmare.
It turns out matching std::ranges
functions might be tricky for libc++. They use constexpr global variables referring to a struct that has an overloaded (call) operator()
:
namespace __attribute__((__type_visibility__("default"))) std { inline namespace __1 {
namespace ranges {
namespace __sort {
struct __fn {
template <class _Iter, class _Sent, class _Comp, class _Proj>
__attribute__((__visibility__("hidden"))) __attribute__((__exclude_from_explicit_instantiation__)) __attribute__((__abi_tag__("ne190000"))) constexpr static _Iter
__sort_fn_impl(_Iter __first, _Sent __last, _Comp& __comp, _Proj& __proj) {
auto __last_iter = ranges::next(__first, __last);
auto&& __projected_comp = std::__make_projected(__comp, __proj);
std::__sort_impl<_RangeAlgPolicy>(std::move(__first), __last_iter, __projected_comp);
return __last_iter;
}
template <random_access_iterator _Iter, sentinel_for<_Iter> _Sent, class _Comp = ranges::less, class _Proj = identity>
requires sortable<_Iter, _Comp, _Proj>
__attribute__((__visibility__("hidden"))) __attribute__((__exclude_from_explicit_instantiation__)) __attribute__((__abi_tag__("ne190000"))) constexpr _Iter
operator()(_Iter __first, _Sent __last, _Comp __comp = {}, _Proj __proj = {}) const {
return __sort_fn_impl(std::move(__first), std::move(__last), __comp, __proj);
}
template <random_access_range _Range, class _Comp = ranges::less, class _Proj = identity>
requires sortable<iterator_t<_Range>, _Comp, _Proj>
__attribute__((__visibility__("hidden"))) __attribute__((__exclude_from_explicit_instantiation__)) __attribute__((__abi_tag__("ne190000"))) constexpr borrowed_iterator_t<_Range>
operator()(_Range&& __r, _Comp __comp = {}, _Proj __proj = {}) const {
return __sort_fn_impl(ranges::begin(__r), ranges::end(__r), __comp, __proj);
}
}; // struct __fn
}
inline namespace __cpo {
inline constexpr auto sort = __sort::__fn{};
} // inline __cpo
} // ranges
} // inline v1
} // std
This problem might occur to other std library functions, I figure. We might want to generalize CallDescriptions to implicitly match these as if the operator()
was directly invoked. Note that inlining worked this out because the global constant is initialized with a known value, thus the engine trusted it and followed the call.
It turns out matching
std::ranges
functions might be tricky for libc++. They use constexpr global variables referring to a struct that has an overloaded (call)operator()
: [...] This problem might occur to other std library functions, I figure. We might want to generalize CallDescriptions to implicitly match these as if theoperator()
was directly invoked. Note that inlining worked this out because the global constant is initialized with a known value, thus the engine trusted it and followed the call.
Imagine this:
Have a special hidden checker implement the check::ASTDecl<TranslationUnitDecl>
callback.
It would do an AST traversal, an collect all global vardecls that are constant and has initializer. If the initializer type is a CXXRecordDecl, then it could add a mapping to a private map as CXXRecordDecl -> VarDecl
.
The CallDescriptions
could refer to this private map and do lookups in it.
When we have a CallDescription::matches(Call)
invocation, it could check if the Call
is an instance call, and in that case take the decl of it and get to the CXXRecordDecl it corresponds to. It could use that as a key in the map I referred to in the previous paragraph. So, the lookup will result in the VarDecl that we can now match against the CallDescription rule, as if it had a name of a FunctionDecl.
This approach of course wouldn't be always correct, as in CTU, we might actually load definitions of (previously) constant global variable declarations; but this technique should still work for the libc++ case at least.
After we can match std::ranges::sort
, then we can write a checker that basically conservatively eval calls a set of matched functions; thus achieve force opaque calls. However, it might be a good place to ensure that certain other functions always get inlined, such as lightweight container member functions, such as std::span
and std::string_view
for instance, but I don't have specific ideas achieving that.
When running the clang-tidy pass
clang-analyzer-cplusplus.Move
against the following program, it reports a move-after-free withinstd::ranges::sort
.clang-tidy version 17.0.6 libc++ version 17.0.6
Program:
clang-tidy output: