llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.91k stars 11.52k forks source link

[clang-scan-deps] Error if integer literal contains a single quote in `#if` directive #88896

Open Sirraide opened 4 months ago

Sirraide commented 4 months ago

On main, running clang-scan-deps -format=p1689 -- clang++ -std=c++20 test.cc on this program

// File: test.cc
#if 123'124
#endif

gives:

Error while scanning dependencies for test.cc:
test.cc:1:8: error: token is not a valid binary operator in a preprocessor subexpression

The diagnostic seems to be err_pp_expr_bad_token_binop. Removing the single quote fixes this, but from what I can tell, this is in fact a valid pp-number preprocessing token (see [lex.ppnumber]), so I think this is just straight-up a bug.

llvmbot commented 4 months ago

@llvm/issue-subscribers-clang-frontend

Author: None (Sirraide)

On main, running `clang-scan-deps -format=p1689 -- clang++ -std=c++20 test.cc` on this program ```c++ // File: test.cc #if 123'124 #endif ``` gives: ``` Error while scanning dependencies for test.cc: test.cc:1:8: error: token is not a valid binary operator in a preprocessor subexpression ``` The diagnostic seems to be `err_pp_expr_bad_token_binop`. Removing the single quote fixes this, but from what I can tell, this is in fact a valid *pp-number* preprocessing token (see [[lex.ppnumber]](https://eel.is/c++draft/lex.ppnumber#nt:pp-number)), so I think this is just straight-up a bug.
Sirraide commented 4 months ago

This is not a new bug either: using my system’s clang-scan-deps instead, which is still on LLVM 17, results in the same error.

nishithshah2211 commented 3 months ago

@Sirraide - I took a look into finding the root-cause and it is as follows:

During the lexing phase, specifically, during the pre-processor lexing phase, the code flow reaches here: https://github.com/llvm/llvm-project/blob/83646590afe222cfdd792514854549077e17b005/clang/lib/Lex/Lexer.cpp#L2071

And then it skips over, because LangOpts.CPlusPlus14 is not true, which was puzzling, because I was passing -std=c++20, and I could see LangOpts.CPlusPlus14 evaluate to true in subsequent checks at that same place.

The issue is stemming from here: https://github.com/llvm/llvm-project/blob/83646590afe222cfdd792514854549077e17b005/clang/lib/Lex/DependencyDirectivesScanner.cpp#L71-L79 For the directive scanning phase, separate LangOpts are constructed/used which does not respect the actual lang opts. In this case, it does not have the CPlusPlus14 or CPlusPlus20 etc. set correctly, and so the check within LexNumericConstant does not succeed, and we get a token error.

The fix I think should be to pass down the LangOpts from the Lexer or PreProcessor to the Scanner and then create a copy of LangOpts and set the ObjC and LineComment options to true.

Thoughts?

nishithshah2211 commented 3 months ago

The fix I think should be to pass down the LangOpts from the Lexer or PreProcessor to the Scanner and then create a copy of LangOpts and set the ObjC and LineComment options to true.

I am not seeing a straightforward path for this though.

The originating code-path is https://github.com/llvm/llvm-project/blob/c87a7b3bdb673747f2242ba2edc7d5b2f5b53c30/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp#L373

and the Scanner is created here: https://github.com/llvm/llvm-project/blob/c87a7b3bdb673747f2242ba2edc7d5b2f5b53c30/clang/lib/Lex/DependencyDirectivesScanner.cpp#L955

The ScanInstance does have access to the original invocation: https://github.com/llvm/llvm-project/blob/c87a7b3bdb673747f2242ba2edc7d5b2f5b53c30/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp#L298

but not sure what is a good way to pass the original LangOpts to the Scanner.

I could use some help/discussion here @Sirraide or others who are more more knowledgable here. Tagging @jansvoboda11 as well (based on https://github.com/llvm/llvm-project/blob/main/clang/CodeOwners.rst#compiler-options).

Sirraide commented 3 months ago

For the directive scanning phase, separate LangOpts are constructed/used which does not respect the actual lang opts.

Yeah, that definitely sounds a bit wrong to me; I do think we should use the original lang opts or construct them in the same manner as clang usually does, if that is possible; not too familiar with clang-scan-deps myself unfortunately.

That or we could consider always enabling e.g. CPlusPlus26 here, but there may be issues with doing so that I’m not aware of.

Sirraide commented 3 months ago

use the original lang opts or construct them in the same manner as clang usually does

Not sure if this even constructs an invocation like the driver usually does, so it may have to be the latter

nishithshah2211 commented 3 months ago

@Sirraide Yeah I am not sure which would be the best way to move forward. I am hesitant towards adding CPlusPlus26 to LangOpts because I do not understand its implications in terms of backward compatibility or unintended side-effects, given that this is for the pre-processor/lexer.

Would you know who would be able to provide more help? I want to put in a good fix for this issue.

Sirraide commented 3 months ago

CC @cor3ntin @AaronBallman

cor3ntin commented 3 months ago

We should definitively preserve the original LangOptions. When called from FrontEndActions we definitely have access to it (CI.getLangOpts() ), it's just a matter of wiring it through scanSourceForDependencyDirectives When called from DependencyScanningAction, through ensureDirectiveTokensArePopulated, we also seem to have access to it, so we can wire it through that as well (ScanInstance.getLangOpts() )

nishithshah2211 commented 3 months ago

@cor3ntin - based on the discussions happening in https://github.com/llvm/llvm-project/pull/93753, I have posted a revert PR: https://github.com/llvm/llvm-project/pull/94488 and will be following up with a better fix. Can we re-open this issue in the mean time?