Open llvmbot opened 8 years ago
I've noticed that MSVC somehow even translates _T
to this __LPREFIX
(to my surprise).
The code I'm looking at is similar to:
#define LOGMESSAGE(text) _T(__FUNCTION__) _T(":") text
Result after preprocessing becomes:
__LPREFIX( __FUNCTION__) L":" "some text"
However, if I try to reproduce this in a small example, I don't get this behavior.
Replacing the _T
by __LPREFIX
gives the same error as reported by Skripkin Andrey
It would be good to find some documentation for that, I can't find anything online.
CC @EugeneZelenko FYI we now have extension:gnu
, extension:microsoft
, and extension:clang
labels. I think this issue fits into the second one.
@Endilll: Thank you for information!
To avoid duplicate work, I am working at this issue (in my branch).
In terms of this issue I am trying to add support for the following expressions in Clang (under MSVC compatibility mode): https://godbolt.org/z/nrj1j3bMY
#define _CONCAT(A, B) A##B
#define CONCAT(A, B) _CONCAT(A, B)
int main() {
const char *x = __lPREFIX(L"Yes, " __LPREFIX(u8"it " __lPREFIX(U"is ")) L"insane function " CONCAT(L, __FUNCTION__));
return 0;
}
I see the following levels of inseparable implementation:
CONCAT(L, __FUNCTION__)
to __LPREFIX(__FUNCTION__)
__LPREFIX
: re-encoding of string-literals.__FUNCTION__
.I think the implementation is inseparable, because: if (1) is implemented, then we need to handle __LPREFIX
in Sema (2, 3). If we implement (2, 3) without taking (4) into account, then values of expanded __FUNCTION__
in templated functions are not correct (explanation is below).
I am writing this message, because I see difficulty at combination of (2,3,4), and (as per my current plan) it would require significant changes in Sema, by either adding new type of Expr in Clang, or reworking PredefinedExpr
. Before making such changes, I want to coordinate the implementation.
1. Preprocessor. As per my observations and experiments with MSVC, CONCAT(L, __FUNCTION__)
works the following way: B
in _CONCAT(A, B)
gets expanded to __FSTREXP __FUNCTION__
. Later, tokens in L##__FSTREXP __FUNCTION__
are replaced as follows:
/// L##__FSTREXP __FUNCTION__
/// || | |
/// vv v v
/// __LPREFIX( __FUNCTION__)
All of these transformations are easily made in TokenLexer::ExpandFunctionArguments
. I have no questions here, this is implemented in lprefix
2, 3, 4. Sema and TemplateInstantiator. Unfortunately, the approach we followed in D153914, was erroneous: https://godbolt.org/z/vx9zY8aTj
template<class T> class A {
public:
A() {
static const char *X = __FUNCTION__; // A<class int>::A
static const char *Y = "" __FUNCTION__; // A::A<T>
}
};
int main() {
A<int> a;
}
And thus we cannot blindly implement (2, 3) without taking (4) into account. An example of partial implementation is in lprefix.
My proposal for (2,3,4). Create a new kind of Expr
, let's call it StringConcatExpr
(if you have ideas for good names, let me know). This expression would be basically a container of StringLiterals, PredefinedExprs, and __LPREFIX
exprs in an AST form. On Sema level we would create a StringConcatExpr
, and pre-compute its value. Later, in TemplateInstantiator
we can re-build StringConcatExpr
by adjusting values of __FUNCTION__
tokens and re-computing its value.
Pinging reviewers of D153914: @AaronBallman, @cor3ntin, @tahonermann
Edit: regarding the name of new Expr class. Taking a look at existing classes, I think a good name would be like MSStringLiteral
/ MSConcatStringLiteral
/ MSCompositeStringLiteral
. I like the latter.
tl;dr I wanted to add yet another AST node MSCastStringLiteral
to represent __LPREFIX
(and company), but after finding that I need to deal with user defined string literals, I re-evaluated the whole approach and decided that I don't need such AST node.
Recently I've learned that
__LPREFIX
and company) ultimately get the type of outermost cast (which is logical), e.g. https://godbolt.org/z/oqoGvxzj4constexpr size_t operator""_len(const char*, size_t len) { return len; }
constexpr size_t operator""_len(const char8_t*, size_t len) { return len; }
constexpr size_t operator""_len(const char16_t*, size_t len) { return len; }
constexpr size_t operator""_len(const char32_t*, size_t len) { return len; }
constexpr size_t operator""_len(const wchar_t*, size_t len) { return len; }
size_t foo() {
return __lPREFIX(__UPREFIX(__LPREFIX(U"wtf"_len) L"qwe" __LPREFIX(__FUNCTION__))) ""_len;
}
When string literal concatenation takes place in translation phase 6, user-defined string literals are concatenated as well, and their ud-suffixes are ignored for the purpose of concatenation, except that only one suffix may appear on all concatenated literals
https://godbolt.org/z/PrY3cMWMP
#include <cstddef>
constexpr size_t operator""_len(const char*, size_t len) {
return len;
}
size_t foo() {
return ""_len "333"_len * "2"_len "2"_len - ""_len "6" "6"_len "6" "666"_len;
}
The latest implementation plan looks as follows: we can scan the whole "string literal" (including string-like predefined macros like __FUNCTION__
), verify that we don't do concatenation of incompatible types (e.g. u16"" __uPREFIX
with u32"" __UPREFIX
), then omit all "Microsoft String Casts" except the outermost. Pass everything to StringLiteral
builder (including desired string type). If list of tokens contains __FUNCTION__
(or other string-like macros), then we can create several StringLiterals
and one or several PredefinedExpr
, and store them into MSCompositeStringLiteral
. And if we had any UDL, we pass the resulting literal (either StringLiteral
or MSCompositeStringLiteral
) to the UDL builder.
On template instantiation phase we can re-build MSCompositeStringLiteral
changing the value of the containing PredefinedExpr
.
Update: I didn't have much time to work on this until now. I am still interested in finishing this task. I'll try to make it until LLVM 19 release.
Currently I have implemented transformation of string-prefixes (u
, u8
, U
, L
) to
appropriate __LPREFIX
macro-function (via undocumented __FSTREXP
helper macro).
Basically this works as follows:
// In this example macros are defined in reverse order to be able to read from top to bottom.
STR2(__FUNCTION__);
#define STR2(A) #A STR1(A) // Would get expanded to: "__FUNCTION__" STR1(__FSTREXP __FUNCTION__)
#define STR1(A) #A // Would get expanded to: "__FUNCTION__" "__FSTREXP __FUNCTION__"
WIDE(__FUNCTION__)
#define WIDE(X) _WIDE(X)
#define _WIDE(X) L##X
/*
WIDE( __FUNCTION__)
|
/|
/ |
/ |
/ |
v v
_WIDE(__FSTREXP __FUNCTION__)
L##__FSTREXP __FUNCTION__
|| | |
vv v v
__LPREFIX( __FUNCTION__)
*/
Now I am at the point of implementing semantics of __LPREFIX
(and other) macros.
As per my understanding tokens which are inside of __LPREFIX()
parentheses
are treated by MSVC as an independent string literal. E.g.
U"Hello" __UPREFIX(L" " "World")
The concatenation of U"Hello" L" " "World"
is not valid by itself,
unless we apply __UPREFIX
conversion first, which makes it U"Hello" U" World"
.
See https://godbolt.org/z/hcc8KGf5e
Another difficulty is that __LPREFIX()
macros accept function local macros such as
__FUNCTION__
among its parameters. In which turn these function local macros are
context dependent, and shall be re-evaluated in templated context.
I didn't take this into account when I implemented 66c43fbd271a8231187bfcb73428ed663606585d, for more info see my previous comments in this issue. This problem is going to be fixed in terms of my current patch I am working at.
The support of such behavior I want to implement by introducing a new AST node (as mentioned above) called MSCompositeStringLiteral
(don't mind MSCastStringExpr
we won't need it).
See https://godbolt.org/z/qhqr5Gsbx
Due to all of above I see several possible implementations (disclaimer, where I write "recursive" I mean unrolled recursion using some container; I am aware that Clang does not welcome recursion in parsers):
StringLiteralParser
in a way to make it support recursive parsing inside __LPREFIX()
.MicrosoftStringLiteralParser
which inherits from StringLiteralParser
. This new parser would make recursive parsing of __LPREFIX()
and fill fields of StringLiteralParser
base class.StringLiteralParser
into StringLiteralParserBase
, and inherit a new MicrosoftStringLiteralParser
from StringLiteralParserBase
.I am inclining towards the last option, because: pure string literal parser would not need to know about Microsoft specific stuff, and thus it would be easier to maintain it.
Regarding __LPREFIX(__FUNCTION__)
support in MicrosoftStringLiteralParser
: the result of this new parser would not be a single string. Instead it would consist of a list of strings of a single type (e.g. ordinary, wide, etc.),
and tokens like __FUNCTION__
.
Sema::ActOnStringLiteral
would take such possibility into account, and construct MSCompositeStringLiteral
accordingly.
There are two ways in regards to handing template dependent context for MSCompositeStringLiteral
.
MSCompositeStringLiteral::getString(Decl *Context)
which would construct string as per request in respect to current Decl contextTreeTransform
.I didn't decide which is better yet.
CC @cor3ntin ^
Regarding "implementation decision 1": I was concerned that I should rather follow (1) and embed __LPREFIX
support straight into StringLiteralParser
, because theoretically MSFT could support __LPREFIX
in places that are handled outside of ActOnStringLiteral
in terms of Clang code. So I looked at usages of StringLiteralParser
and tested relevant usages in MSVC (the latest version 19.39). Looks like approach (3) still holds:
__LPRFIX
here as per initial idea__LPREFIX(__FUNCTION__)
here with additional code__lPREFIX("")
is not allowed in declaration of user defined literals__lPREFIX("")
is not allowed in declaration of user defined literals__LPREFIX
in pragmas.__LPRFIX
nor string concatenation heremodule
Thank you for the detailed investigation into improved compatibility here! It was clearly a lot of work and it is truly appreciated. That said, I have some concerns with how much effort we would still need to put into this feature, and the long term maintenance costs for something that would not be used a lot. It's not a documented API from Microsoft, it isn't used in any Windows SDK, MS CRT, or other system headers (that I've been able to find, anyway), and it's not commonly used in the wild (at least with the best tools we have to search over a large corpus of code: https://sourcegraph.com/search?q=context:global+__LPREFIX+lang:C+lang:C%2B%2B&patternType=keyword&case=yes&sm=0). So this looks like a very large amount of implementation effort for a nominal feature.
This topic has come up before in https://bugs.llvm.org/show_bug.cgi?id=11789 and I think we ended up with the amount of support we'd like in https://github.com/llvm/llvm-project/commit/3a691a367c7d512a1448e0d88b34c1b05c07ce14 (patch discussion found at https://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120618/059525.html). In short, we don't support __LPREFIX
but we do support combining L
with predefined identifiers that are macro-like. We end up supporting a bit more than Microsoft does, but you have to enable -fms-extensions
mode to get the behavior: https://godbolt.org/z/xbqqe3Ma5
I think we may want to close this issue as Won't Fix given how much effort is required for exact MS compatibility and how little use this extension seems to have in the wild. If it starts showing up in system headers, we may want to reconsider at that point.
tl;dr I am willing to take a risk that my patch could not end up in main.
Thank you for feedback and references! I understand your concerns regarding maintenance cost, and amount of efforts required to develop such support.
Why is it important for our commercial product? Keeping it short, Clang cannot parse a source file that was preprocessed by MSVC, if it contains __LPREFIX
. See: https://godbolt.org/z/fqvf6edx3
Why is it important for Clang? Clang-cl is not able to produce similar results as MSVC when __FUNCTION__
is concatenated with other strings in the templated context (as I mentioned above in regards to D153914), see https://godbolt.org/z/W74rvMd8v
There are ways to make it work, but this would add even more workarounds to existing workaround approach with L__FUNCTION__
. I believe it's possible to reach parity with MSVC by making a proper implementation, as well as removing L__FUNCTION__
and 66c43fbd271a8231187bfcb73428ed663606585d workarounds.
Where L##__FUNCTION__
is used? Windows Trace Logging macros produces L##__FUNCTION__
internally. By compiling a product that uses TraceLogging with clang-cl instead of MSVC one would get different logs.
I would like to finish this task, and I am willing to take a risk of getting merge rejection, if community don't like the final patch series.
At the moment I see the final result as series of several patches:
L##__FUNCTION__
to __LPREFIX(__FUNCTION__)
conversion via __FSTREXP
without semantics (changes mostly in Lexer).__LPREFIX
semantics in MicrosoftStringLiteralParserI believe (2) and (3) should not be difficult, I just need to come up with a proper implementation.
I am open for discussions if someone has interest in collaboration. And I could create a more detailed development plan (instead of keeping everything in my unstructured notes and thoughts) if someone would find it useful.
My plans have changed. I don't have time to work on it either at work or during my free time. I'll try to get back to it during the next half of the year.
Extended Description
This report is about MSVC feature
__LPREFIX
that adds prefixL
to its argument, for example, it changes its argument's typechar[n]
towchar_t[n]
. Identifier__LPREFIX
with any argument is unknown for clang. =========Environment============= OS: Win Version: trunk=========Reproducer==============
test.cpp
===========Output================ MSVC compiles clearly
It should work like macro, but it is really not preprocessor macro:
================================= Andrey Skripkin Software Engineer Intel Compiler Team