Open llvmbot opened 4 years ago
mentioned in issue llvm/llvm-bugzilla-archive#51036
mentioned in issue llvm/llvm-bugzilla-archive#47727
mentioned in issue llvm/llvm-bugzilla-archive#46215
llvm/llvm-bugzilla-archive#47727 , http://lists.llvm.org/pipermail/cfe-dev/2020-November/067145.html: another example.
In this case the users were confused about whether it's a true positive warning that highlights a strict aliasing violation. In fact we could go down this road and transform instances of this bug into strict aliasing violation warnings when applicable; that would potentially simplify the solution when it'll only require us to reason about extents when one of the involved types is char
(which bypasses strict aliasing rules). I don't see much indication that this approach would actually be much easier but these problems are indeed somewhat related.
No, unfortunately, this is too difficult for a newcomer. I could have tried to do it as a GSoC project with a good student. The problem is easy to understand, but this piece of code (RegionStore) is one of the most difficult, convoluted and controversial parts of the Static Analyzer.
Artem, having watched Meike's talk [https://www.youtube.com/watch?v=F_q50Th1t3A], do you think this ticket would be a reasonable entry point for contributions for someone who has never before contributed to Clang/LLVM?
No, it's just to keep it a valid C++ program. In C++ you can't use a function without declaring it (unlike C) and the code under analysis still needs to be a valid program.
The function is completely fake, of course, but nobody will notice that it's fake until link time, and the analyzer will not do linking, so it's fine. And it won't get to the actual .o file because it's preprocessed out because __clang_analyze__ isn't defined during normal compilation.
So the point of the suppression is to tell the analyzer "well, something happens to &dest here, just trust me".
Thank you very much for the confirmation and the additional information.
Am I right to assume that the CSA only requires the prototype of maybe_initialize() because it does not (currently) analyze beyond module boundaries?
Aha, yes, that's a false positive on our part. The analyzer fully understands what does CopyDeviceRegisters() do, but our memory model fails to confirm that writing a 32-bit integer would initialize all 4 bytes of that integer, so we become confused when you start reading bugs byte-by-byte.
This is one of the most commonly reported bugs these days, so i hope i'll get to it soon, but it requires large changes to fix.
If you want to suppress the bug, you can try the following trick:
static void CopyDeviceRegisters( struct device_registers volatile& dest, struct device_registers volatile* src ) { +#ifdef __clang_analyzer__
assigned to @haoNoQ
https://reviews.llvm.org/D120489 is an example of good things we can't have because of this issue.
Hey @haoNoQ , I want to get through this issue once. Could you just share some docs related to regionstore?
Could you just share some docs related to regionstore?
It largely still works as described in Xu, Zhongxing & Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C Programs. (https://lcs.ios.ac.cn/~xzx/memmodel.pdf). There's a little bit of extra documentation in https://github.com/llvm/llvm-project/blob/main/clang/docs/analyzer/developer-docs/RegionStore.rst but that's mostly it.
So the current implementation of BindingKey
in https://intel.github.io/llvm-docs/clang_doxygen/RegionStore_8cpp_source.html basically says "at this base region, at this offset, we have this value", and the goal of this bug is to replace it with "at this base region, from this offset to this offset, we have this value". With all the consequences such as interactions around overlaps etc.
The word "extent" in the title refers to the length of the segment between these "from" and "to" offsets. Type-system-wise it makes sense to have such extent simply derived from the value type; in this sense the binding key doesn't really need any changes, only read/write algorithms do. But, again, the problem is that we aren't exactly type-correct. Maybe we can ignore lack of type correctness and still do better than before? - something to think about. I also hope that https://reviews.llvm.org/D105340 ultimately fixes our type-correctness.
@isuckatcs is making some progress on this in https://reviews.llvm.org/D132752!
Extended Description
Hi,
admittedly neither I nor my colleagues are sure whether this is a false positive of the static analyzer (CSA) or a misconception of sorts on our part.
So the issue is in the attached (self-contained) example. I am pasting it here for reference:
include
include
struct device_registers { uint32_t n1; uint32_t n2; uint32_t n3; uint32_t n4; uint32_t n5; } attribute((packed));
static_assert( 20 == sizeof(struct device_registers), "unexpected struct size" );
static void CopyDeviceRegisters( struct device_registers volatile& dest, struct device_registers volatile* src ) { dest.n1 = src->n1; dest.n2 = src->n2; dest.n3 = src->n3; dest.n4 = src->n4; dest.n5 = src->n5; }
// this is a pretend process_bytes() from boost/crc.hpp // (where it's a member function) void process_bytes(void const buffer, size_t byte_count) { // "checksum creation" unsigned char const const b = static_cast<unsigned char const*>(buffer); for(size_t i = 0; i < byte_count; i++) { printf("%02X ", b[i]); } printf("\n"); }
int main(int, char*) { // in our code this is a uio mapping struct device_registers mock{}; struct device_registers volatile mapped_regs = &mock;
}
This code is representative of the issue we're seeing and the attached file includes a run of CSA over that MWE.
After some tinkering it appears clear that the volatile nature of mapped_regs and the arguments to CopyDeviceRegisters() are an issue here.
My guess is that CSA "reasons" that - due to the volatility - the data in mapped_regs isn't initialized and that fact isn't changed by CopyDeviceRegisters().
How would we have to convey it to CSA/Clang that we deem local variable (shadow) initialized after the call to CopyDeviceRegisters()? ... or is this a false positive which we should not care about?
Thank you for reading and with best regards,
Oliver
PS: After unpacking the MWE you should be able to use 'make analyze' which should 'make clean' (implicitly), followed by running 'make mwe' via scan-build. So the assumption is that scan-build is in the PATH and works.
We're working with a Clang that I build myself from 0399d5a9682b3cef71c653373e38890c63c4c365 (which should be the 9.0.0 release tag or at least was at one point) which targets x86_64-unknown-linux-gnu on a vanilla Ubuntu 14.04 (yes, I built it on a pristine LXD image of Ubuntu 14.04, so there isn't any "contamination" beyond build dependencies).