llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.7k stars 11.87k forks source link

libtooling: behaviour of `__COUNT__` diverges from calling `clang` manually #87588

Open giulianobelinassi opened 6 months ago

giulianobelinassi commented 6 months ago

With the following files:

header.h:

#define __DEFINE_FUNCTION(i) \
   int function_ ## i (void) { return 0; }
#define _DEFINE_FUNCTION(i) __DEFINE_FUNCTION(i)
#define DEFINE_FUNCTION      _DEFINE_FUNCTION(__COUNTER__)
DEFINE_FUNCTION;

test.c:

#include "header.h"
DEFINE_FUNCTION;
int main()
{
  function_0();
  return 0;
}

compiling with:

$ clang++ -O2 -c test.c

Results in the input being accepted.


Now with libtooling, the following reproducer ( $ clang++ -g llvm-repro.cpp -lclang-cpp -lLLVM):

#include "clang/Frontend/ASTUnit.h"
#include "clang/Frontend/CompilerInstance.h"

using namespace clang;
using namespace llvm;

static const char *const header =
"#define __DEFINE_FUNCTION(i) \\\n"
"   int function_ ## i (void) { return 0; }\n"
"#define _DEFINE_FUNCTION(i) __DEFINE_FUNCTION(i)\n"
"#define DEFINE_FUNCTION      _DEFINE_FUNCTION(__COUNTER__)\n"
"DEFINE_FUNCTION;\n";

static const char *const test_file =
"#include \"header.h\"\n"
"DEFINE_FUNCTION;\n"
"int main()\n"
"{\n"
"  function_0();\n"
"  return 0;\n"
"}\n";

static const std::vector<const char *> clang_args =  {"-O2", "-c", "test.c"};

int main(void)
{
  IntrusiveRefCntPtr<DiagnosticsEngine> Diags;
  std::shared_ptr<CompilerInvocation> CInvok;
  std::shared_ptr<PCHContainerOperations> PCHContainerOps;

  IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> MFS;

  MFS = IntrusiveRefCntPtr<vfs::InMemoryFileSystem>(new vfs::InMemoryFileSystem);
  MFS->addFile("test.c", 0, MemoryBuffer::getMemBufferCopy(test_file));
  MFS->addFile("header.h", 0, MemoryBuffer::getMemBufferCopy(header));

  DiagnosticOptions *diagopts = new DiagnosticOptions();

  Diags = CompilerInstance::createDiagnostics(diagopts);

  clang::CreateInvocationOptions CIOpts;
  CIOpts.Diags = Diags;
  CInvok = clang::createInvocation(clang_args, std::move(CIOpts));

  FileManager *FileMgr = new FileManager(FileSystemOptions(), MFS);
  PCHContainerOps = std::make_shared<PCHContainerOperations>();

  auto AST = ASTUnit::LoadFromCompilerInvocation(
      CInvok, PCHContainerOps, Diags, FileMgr, false, CaptureDiagsKind::None, 1,
      TU_Complete, false, false, false);

  const DiagnosticsEngine &de = AST->getDiagnostics();

  if (AST == nullptr || de.hasErrorOccurred()) {
    llvm::outs() << "Rejected.\n";
    return 1;
  } else {
    llvm::outs() << "Accepted.\n";
    return 0;
  }
}

Results in the output always being rejected with the following output:

test.c:2:1: error: redefinition of 'function_0'
    2 | DEFINE_FUNCTION;
      | ^
header.h:4:30: note: expanded from macro 'DEFINE_FUNCTION'
    4 | #define DEFINE_FUNCTION      _DEFINE_FUNCTION(__COUNTER__)
      |                              ^
header.h:3:29: note: expanded from macro '_DEFINE_FUNCTION'
    3 | #define _DEFINE_FUNCTION(i) __DEFINE_FUNCTION(i)
      |                             ^
header.h:2:8: note: expanded from macro '__DEFINE_FUNCTION'
    2 |    int function_ ## i (void) { return 0; }
      |        ^
<scratch space>:3:1: note: expanded from here
    3 | function_0
      | ^
header.h:5:1: note: previous definition is here
    5 | DEFINE_FUNCTION;
      | ^
header.h:4:30: note: expanded from macro 'DEFINE_FUNCTION'
    4 | #define DEFINE_FUNCTION      _DEFINE_FUNCTION(__COUNTER__)
      |                              ^
header.h:3:29: note: expanded from macro '_DEFINE_FUNCTION'
    3 | #define _DEFINE_FUNCTION(i) __DEFINE_FUNCTION(i)
      |                             ^
header.h:2:8: note: expanded from macro '__DEFINE_FUNCTION'
    2 |    int function_ ## i (void) { return 0; }
      |        ^
<scratch space>:3:1: note: expanded from here
    3 | function_0
      | ^
Rejected.

Notice how it is exactly the same code, but one is being compiled through clang, whereas the other is being compiled through libtooling.

I'd say this is undesired behavior and a bug.

clang version 18.1.2
Target: x86_64-suse-linux
Thread model: posix
InstalledDir: /usr/bin
giulianobelinassi commented 6 months ago

Ping

giulianobelinassi commented 6 months ago

ping 2

giulianobelinassi commented 6 months ago

ping 3

marcosps commented 6 months ago

@EugeneZelenko do you have any code pointers that we could check and try to solve this issue ourselves? Thanks in advance!

giulianobelinassi commented 2 months ago

This have to do with PCH generation. The following code creates a Preprocessor to compile them:

  if (PrecompilePreambleAfterNParses > 0) {
    PreambleRebuildCountdown = PrecompilePreambleAfterNParses;
    OverrideMainBuffer =
        getMainBufferWithPrecompiledPreamble(PCHContainerOps, *Invocation, VFS);
    getDiagnostics().Reset();
    ProcessWarningOptions(getDiagnostics(), Invocation->getDiagnosticOpts());
  }

from ASTUnit::LoadFromCompilerInvocation

tokyo4j commented 1 week ago

I encountered this problem when I was browsing fuzzel codebase with clangd. I hope this gets fixed.

kadircet commented 1 week ago

as mentioned by @giulianobelinassi, this is due to usage of PCHs by ASTUnit or clangd for quickly reparsing sources.

You can imagine this as running two separate compilations, one for all the includes (preamble) in your file, and another using this preamble for includes and parsing rest of the source file. As a result both compilations initialize __COUNTER__ to 0.

The same also likely to affect all sorts of modular builds, as each module will be build with its own __COUNTER__ and merged together in a not-so-strictly ordered fashion. cc @dmpolukhin.

In the case of tooling we at least have a single preamble and single main file parse, so we can work around by storing the final count in PCHs and re-using that during main file parse. But this solution won't work when there are multiple PCHs involved, and it gets even nastier for actual modules.

https://isocpp.org/files/papers/P3384R0.html#design-considerations is recognizing this difficulties but isn't really proposing a solution. I think we'd need someone to put a lot of thought into either making it work in general or find a way to restrict them for a modules-friendly world. cc @AaronBallman

For anyone interested in remedying tooling adjacent symptoms by storing/restoring __COUNTER__ in PCHs, http://go/llvm-project/clang/include/clang/Frontend/PrecompiledPreamble.h is the abstraction used by both clangd and ASTUnit for managing preamble-PCHs.

dmpolukhin commented 1 week ago

From our observations and from the code fragment in the initial post. It seems that people usually use __COUNTER__ as just a unique value, they don't need monotonically increasing values and any unique value would work but it has to be reproducible always produce the same AST from the same inputs. So I was thinking about some seed from filename (perhaps with some path parts, for example like it was specified in command line). __COUNTER__ value will be next random value generated based on this seed. Every time when module is loaded, the seed of the module and the main TU seed are combined to produce new seed value.

giulianobelinassi commented 1 week ago

For our usage we decided to disable PCH altogether, since it avoids this problem and we noticed significant speedups in clang-extract. This can be done changing:

auto AST = ASTUnit::LoadFromCompilerInvocation(
      CInvok, PCHContainerOps, Diags, FileMgr, false, CaptureDiagsKind::None, 1,
      TU_Complete, false, false, false);

to

auto AST = ASTUnit::LoadFromCompilerInvocation(
      CInvok, PCHContainerOps, Diags, FileMgr, false, CaptureDiagsKind::None, 0,
      TU_Complete, false, false, false);