llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.53k stars 11.79k forks source link

`run-clang-tidy` processes files from compilation database in random order #73357

Open firewave opened 10 months ago

firewave commented 10 months ago

A compilation database contains a JSON array of compilation commands thus has a fixed order. Such a file can be generated from CMake or a make command (via bear) and reflects the order of the commands as they are executed by the build system. The order might be intentional to process long-running files first so in a multi-threaded environment to reduce the build time as you will have better parallelism. The same might is most likely true for the clang-tidy invocation.

Unfortunately run-clang-tidy (which takes a compilation database via -p) processes the files in a random order. The files should be processed in the order provided by the input file.

Example:

[
{
  "directory": "/home/sshuser",
  "command": "/usr/bin/clang++-18 -o big1.cpp.o -c big1.cpp",
  "file": "/home/sshuser/big1.cpp"
},
{
  "directory": "/home/sshuser",
  "command": "/usr/bin/clang++-18 -o big2.cpp.o -c big2.cpp",
  "file": "/home/sshuser/big2.cpp"
},
{
  "directory": "/home/sshuser",
  "command": "/usr/bin/clang++-18 -o c.cpp.o -c c.cpp",
  "file": "/home/sshuser/c.cpp"
},
{
  "directory": "/home/sshuser",
  "command": "/usr/bin/clang++-18 -o d.cpp.o -c d.cpp",
  "file": "/home/sshuser/d.cpp"
}
]
clang-tidy-18 -p=. /home/sshuser/d.cpp
clang-tidy-18 -p=. /home/sshuser/big1.cpp
clang-tidy-18 -p=. /home/sshuser/c.cpp
clang-tidy-18 -p=. /home/sshuser/big2.cpp
clang-tidy-18 -p=. /home/sshuser/d.cpp
clang-tidy-18 -p=. /home/sshuser/big1.cpp
clang-tidy-18 -p=. /home/sshuser/big2.cpp
clang-tidy-18 -p=. /home/sshuser/c.cpp
firewave commented 10 months ago

The problem most likely lies within this code:

  files = set([make_absolute(entry['file'], entry['directory'])
           for entry in database])

So just changing set to list should fix the issue.

The code is obviously to prevent the same file from being processed more than once. But the logic is flawed as it also needs to consider the command field - that might become infinitely complex though. I will file a different ticket about that.

There's also the case of multiple arch(?) being specified for a single file (still don't know what generates those and how) - but that is out of the scope of the wrapper script.

Chardrazle commented 5 months ago

Was another ticket filed for that extra case you mention?
I was just curious because I ran into a corner-case where the same cpp file is being compiled into two different executables, with a slight change to the compilation settings. The “file” property appears to be used as the key; however, in this case it is not unique. Having support for specifying the file that matches the “output” property might be able to address this.

firewave commented 5 months ago

Was another ticket filed for that extra case you mention?

No. I did not think that it might be necessary since fixing the initial issue might also require addressing it at the same time.