Closed sfcheng closed 6 months ago
On an English install "Note: including file:%s%s\n" appears as resource in the string table in VC\bin\1033\clui.dll.
1033 is the locale identifier for "English (United States)".
I'm not aware of any command line option to force cl into 1033 locale unfortunately. I assume if multiple locales are installed it uses system settings to determine which to use.
So, I guess we'd have to have add various languages to the prefix search in the /showIncludes parser. :/
I think cmcldeps (the CMake parser of this output) uses a regex, something like "[^:]+: [^:]+: (.*)", to grab all output lines that look like showincludes output. I haven't looked at the code too hard because I'd eventually like to implement something like that and I don't want to violate any copyrights. :)
The tricky part is not confusing showincludes output with warnings. sfcheng, could you paste the Chinese output of Visual Studio cl.exe when showing a warning or error message?
It looks like that : is not ascii 58, so that might add a wrinkle. Maybe the line number "(\d+)" that will be in errors could be useful signal.
I'm not aware of any command line option to force cl into 1033 locale unfortunately.
There is no (clean) way to do this, unfortunately. Foreign language versions of VS would have other locale resources in different number(e.g. 1041 for JA). What we learned: always install EN version of VS, then install language pack if needed :(
But fortunately, "error Cnnnn" and "warning Cnnnn" never get localized. So we can use them as key. But as @sgraham said, line numbers looks like more promising because it would also allow filtering out 'note:' output.
I'm not sure whether or not : are not ascii 58. In Japanese version, these are certainly ascii 58.
FWIW, Japanese output would look like this:
C:\cygwin\home\oku>type main.c
#include <stdio.h>
int nah(void){}; /* Trigger "function must return a value */
main(){return nah();}
C:\cygwin\home\oku>cl /showIncludes main.c
Microsoft(R) C/C++ Optimizing Compiler Version 16.00.40219.01 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
main.c
メモ: インクルード ファイル: C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h
メモ: インクルード ファイル: C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\crtdefs.h
メモ: インクルード ファイル: C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\sal.h
メモ: インクルード ファイル: c:\program files (x86)\microsoft visual studio10.0\vc\include\codeanalysis\sourceannotations.h
メモ: インクルード ファイル: C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\vadefs.h
メモ: インクルード ファイル: C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\swprintf.inl
c:\cygwin\home\oku\main.c(2) : warning C4716: 'nah' : 値を返さなければいけません
Related art: https://bugzilla.mozilla.org/show_bug.cgi?id=587372 (approach there: read prefix from an env var, have an autoconf check to verify /showIncludes parsing works. Not great.)
Idea similar to the mozilla bug: There could be a toplevel var msvs_includes_prefix
, and generators could write that by compiling a dummy #include "knownheader.h"
file with /showIncludes and writing whatever is in front of "knownheader.h" in the cl.exe output into that toplevel var. Ninja would then use msvs_includes_prefix as the /showIncludes prefix.
While CMake configurures, the include prefix is read from one dummy build, https://github.com/Kitware/CMake/blob/master/Modules/CMakeClDeps.cmake where a regex is used, later on the prefix is passed as argument to the tool, https://github.com/Kitware/CMake/blob/master/Source/cmcldeps.cxx and std::string::substr is used for processing the cl.exe output.
I assume ninja also needs to accept an additional (global) argument. (like nico suggested)
CMake also uses cmcldeps to generate dependencies for .rc files by first "compiling" the .rc file with cl which generates the dependency file, and then processes it with the rc tool. Not sure if or how this could be integrated into ninja.
https://github.com/martine/ninja/pull/665
Does this work with non-Ascii prefixes?
With a french local, ninja 1.8.2 and CMake 3.10.2, this still happens...
@nico I see code in CMake which references it.
Still happening on Visual Studio Community 15.9.7...
For the record, this is still happening to me with the following configuration: CMake 3.14 Ninja 1.8.2 (the one that ships with Visual Studio 2019) French locale.
EDIT: Better workaround: set VSLANG=1033
in the environment to force CL to output English messages.
Old workaround:
And for those who also hit this issue, my workaround was to comment out the following line in $CMAKE_PATH\share\cmake-3.14\Modules\Platform\Windows-MSVC.cmake
:
#set(CMAKE_NINJA_DEPTYPE_${lang} msvc)
(line 368 in mine)
This unfortunately causes CMake to generate deps = gcc
instead of juste removing the deps line, but that hasn't seemed to break my builds. YMMV. This is a workaround.
Setting deps = gcc
is probably benign without setting depfile
as well.
@DrFrankenstein Would you be willing to try reproducing this after applying this PR to the ninja codebase? https://github.com/ninja-build/ninja/pull/1671
I'll give it a shot this week!
Unfortunately, that didn't fix it.
I built ninja from that branch, then used that version of ninja to build itself again, and it still leaked the include messages to the terminal.
It seems to me like the issue here might be with the MSVC include handler.
Is the grammar for that not recognizing the output from cl.exe correctly?
Well...
It looks like it's a problem with hardcoded english.
https://github.com/ninja-build/ninja/blob/master/src/clparser.cc
string CLParser::FilterShowIncludes(const string& line,
const string& deps_prefix) {
const string kDepsPrefixEnglish = "Note: including file: ";
const char* in = line.c_str();
const char* end = in + line.size();
const string& prefix = deps_prefix.empty() ? kDepsPrefixEnglish : deps_prefix;
if (end - in > (int)prefix.size() &&
memcmp(in, prefix.c_str(), (int)prefix.size()) == 0) {
in += prefix.size();
while (*in == ' ')
++in;
return line.substr(in - line.c_str());
}
return "";
}
@DrFrankenstein
Do you feel like messing with that english prefix at the top of the function to see if it does you any better?
Ha! I was actually going to look in there tomorrow. Seems like you caught it before I had a chance to.
I just shut my computer down for the night. I'll get back to you tomorrow!
It's worth noting, though, deps_prefix
should contain the localized string as set in the rules.ninja file (usually detected and set by CMake). It only uses the hard-coded one if it's absent.
I suspect the logic just after it might be the actual culprit. But as I said, I'll do a proper investigation/debugging session tomorrow.
The encodings don't match. deps_prefix
is in Latin-1 (where the NBSP before the colons is 0xA0), and line
is in CP437 for some reason (NBSP = 0xFF).
I think that CL itself is outputting CP437, but the CMake-generated rules.ninja is in Latin-1. I'm guessing that some conversion occured on the CMake side, but that'll require more digging.
EDIT: It seems like CL will output in whatever the console's codepage is. (Source 1, Source 2). I'm not sure how we can force it to be something else.
Perhaps we can bring the two together by converting them both to a common encoding such as UTF-8 (or whatever Ninja prefers to use), e.g. by calling MultiByteToWideChar(CP_OEMCP, ...)
on the CL output, and MultiByteToWideChar(1252, ...)
on the string that comes from rules.ninja.
Thinking back on this... this might be CMake's fault. On Windows, the execute_process
command seems to convert the output of the command to UTF-8 internally (and accepts an optional ENCODING
parameter to specify the output's encoding). It thus writes it back in UTF-8 in the rules.ninja file (where NBSP is 0xA0 and not 0xFF).
I tried changing CMAKE_DETERMINE_MSVC_SHOWINCLUDES_PREFIX
to use ENCODING NONE
(perform no conversion), but it seemed to break all sorts of things in CMake.
So the question I'm having now is... should ninja's msvc_deps_prefix
be a bitwise match of the compiler's output, or should it be in whatever encoding the file is expected to be, in which case it would be Ninja's job to do the proper conversions from the compiler output?
@bradking Thoughts on the encoding and prefix detection here?
Historically ninja has been encoding agnostic (as long as the encoding used the same byte as ASCII for '/'). However, Windows might make that difficult.
Ninja's CLParser::FilterShowIncludes
is using memcmp
to compare msvc_deps_prefix
to lines in MSVC's output so indeed the value needs to be a bitwise match. CMake may need some work to preserve that. CMake currently converts to UTF-8 internally so perhaps all that is missing is converting back to the codepage's encoding when writing the value to build.ninja
.
IIRC, MSVC's output encoding can be affected by environment variables and/or flags. That means we may end up with the compiler output in a different encoding than the codepage in which Ninja is operating and using to interpret strings in build.ninja
. Such cases may require extra support from Ninja to handle but further investigation would be needed.
I couldn't find any environment variable affecting the codepage used by CL. I think it just uses the codepage associated with the process (which is based on the system's regional settings, or by the console settings if the process is running in one).
However, there is an environment variable that sets the language used by CL, VSLANG
, which can be useful as a workaround for users affected by this bug. Setting VSLANG=1033
before generating the ninja files will prevent the bug from happening.
Just to restate my above comment in different words: Ninja treats its input files as (encodingless) bytes, and does encoding-ignorant byte comparisons of strings, to attempt to evade these issues. You need the bytes that appear in the build.ninja file to match the bytes that ninja reads from the process stdout, but ninja doesn't care about encodings.
After CMake generating all the build files, I manually converted rules.ninja
to UTF-8 which contains a line msvc_deps_prefix = 注意: 包含文件:
, and then things got fixed. (That file used to be in GB2312 encoding, which correspond to the default code page 936.) I guess changes could be done to CMake so that it always converts rules.ninja
to UTF-8?
I have no experience working on locales other than code page 936 or 65001, so I have no idea whether the solution above is a universal fix, though.
Same problem and manage to erase this output with add /W2 instead of /W3 in CMAKE_CXX_FLAGS
This is related to #1766
as sgraham said:
On an English install "Note: including file:%s%s\n" appears as resource in the string table in VC\bin\1033\clui.dll.
I write a script find_msvc_prefix.py to check all clui.dll on my pc, it seems like all "Note: including file:%s%s\n" are at similar place string table/26:1033
/408
VS2019 have only 14 language packs. We can store them in the source code.
I found Baze already support this.(see Windows, MSVC: Support parsing /showIncludes in 14 languages #7966
, ShowIncludesFilter.java
`)
I combine the two table("Note: including file:" LCID (Locale ID)), and made this table: | Lang ID | Description | String template | BCP 47 Code |
---|---|---|---|---|
1028 | Chinese - Taiwan | 注意: 包含檔案:%s%s\n | zh-TW | |
1029 | Czech | Poznámka: Včetně souboru:%s%s\n | cs-CZ | |
1031 | German - Germany | Hinweis: Einlesen der Datei:%s%s\n | de-DE | |
1033 | English - United States | Note: including file:%s%s\n | en-US | |
1036 | French - France | Remarque : inclusion du fichier : %s%s\n | fr-FR | |
1040 | Italian - Italy | Nota: file incluso %s%s\n | it-IT | |
1041 | Japanese | メモ: インクルード ファイル: %s%s\n | ja-JP | |
1042 | Korean | 참고: 포함 파일:%s%s\n | ko-KR | |
1045 | Polish | Uwaga: w tym pliku: %s%s\n | pl-PL | |
1046 | Portuguese - Brazil | Observação: incluindo arquivo:%s%s\n | pt-BR | |
1049 | Russian | Примечание: включение файла: %s%s\n | ru-RU | |
1055 | Turkish | Not: eklenen dosya: %s%s\n | tr-TR | |
2052 | Chinese - People's Republic of China | 注意: 包含文件: %s%s\n | zh-CN | |
3082 | Spanish - Spain (Modern Sort) | Nota: inclusión del archivo:%s%s\n | es-ES |
we can rearrange the 14 lang detection order by:
en-US
second, the others are sorted in descending order by the number of people who speak the languagezh-CN; en-US
zh-CN
, 2nd:en-US
Patch all the clui.dll
, make all the "Note: including file:" in various languages to English version.
Advantage:
UTF-16 BE/LE
, all strings are the same in binary.msvc_deps_prefix
, just use en-US
version string to make 2nd compare in CLParser::FilterShowIncludes
may fix this.Disadvantage:
now. msvc build tools 2019. locale ru-RU. Clion (cmake 3.21, ninja 1.10.2).
Примечание: включение файла: C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\condition_variable
Примечание: включение файла: C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\yvals_core.h
Примечание: включение файла: C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\vcruntime.h
Примечание: включение файла: C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\sal.h
There is a solution?
The workaround I use for now:
In console set VSLANG=1033
in CMakeLists.txt add
if (CMAKE_GENERATOR MATCHES "Ninja")
set(CMAKE_CL_SHOWINCLUDES_PREFIX "Note: including file:")
endif()
Similar issue here: I've got Windows localized in italian, and VS2019 in english. Ninja outputs the italian "Nota: file incluso" string. Setting VSLANG=1033
in the console before calling vcvars32/64 solved the issue.
Similar issue on last Visual Studio Preview 2022 17.2 with ninja 1.10.2 in CLion 2021.3. Tons of "Note including file: foo.h" even without /showIncludes option set. So annoying
Hey all! Came across this while writing some Bazel tooling. After some experimenting, we ended up matching all the languages rather than using VSLANG=1033, because the latter doesn't work if the user doesn't have the English language pack installed. (Seems to work otherwise, though.)
A solution not very good: use Clang rather than MSVC
such "Note include" output by MSVC tool chain,just change it.
用了clang就不能用.nativs了。。。。。。 改了一下源码,提供了二进制版本 https://github.com/GuoZiyangTYUST/ninja/releases/tag/%E5%8E%BB%E9%99%A4%E5%86%97%E4%BD%99%E8%BE%93%E5%87%BA
Hit this issue as well, so we just switched to msbuild from ninja and the issue was gone. Unexpectedly found that msbuild has been far faster than before :D
I guess the bug is in cmake.
CMake can detect the prefix correctly, and stores it in its variable 'CMAKE_CL_SHOWINCLUDES_PREFIX' (or CMAKE_CXX_CL_SHOWINCLUDES_PREFIX). Note that this prefix is not utf-8 encoding. For example, in my locale it is encoded in 'GB18030'. However, when creating rules.ninja, cmake converts this prefix to utf-8 and saves the converted data to msvc_deps_prefix.
During build, msvc_deps_prefix (utf-8 encoding) cannot match MSVC's output (gb18030 enconding, et al), so Ninja reports lots of include files.
I switched to cmake-3.20, which doesn't convert the prefix to utf-8, and everything works well.
I switched to cmake-3.20, which doesn't convert the prefix to utf-8, and everything works well.
So this can be closed then?
A relevant change in CMake is MR 5089, first included in CMake 3.19. It switched to writing the msvc_deps_prefix
as a raw byte sequence regardless of the encoding of build.ninja
, since Ninja only compares raw byte sequences to look for such lines in the compiler output.
The best path forward for Ninja upstream is #1812/#2200, which add support for using cl
's new equivalent to a gcc depfile.
I switched to cmake-3.20, which doesn't convert the prefix to utf-8, and everything works well.
So this can be closed then?
No as it still exists in CMake 3.24. I checked MR 5089, looks like the issue was introduced again in MR 5860 . When Ninja version is larger than 1.11, CMake assumes Ninja supports UTF-8; so it writes rules.ninja with UTF-8.
@kangjianbin thanks. Please open a CMake upstream issue for that. Include details about the actual encoding of the byte sequence that you're observing.
Created issue https://gitlab.kitware.com/cmake/cmake/-/issues/24068
Please try CMake 3.25.
CMake 3.25 works for me. Thanks.
When I update cmake to 3.25 this issue doesn't seem to be resolved.
cmake: 3.25.0
ninja: 1.11.1
chcp: 936
Whether you run cmake --build .
Or ninja all
, the result is the same. Also, modifying the code page to 54936 has no effect.
Enable Beta: Use Unicode UTF-8 for worldwide language support
and restart Windows:
I suggest that cmake/ninja should proactively avoid annoying problems caused by MSVC's multilingual mechanism if it wants to read this information from the standard output of 'cl.exe'.
Enable
Beta: Use Unicode UTF-8 for worldwide language support
and restart Windows:
Did cmake detect the prefix correctly?
When building chromium with ninja with Chinese version of Visual Studio 2010, the console window outputs tons of include file notices and thus greatly slows down the building process. The notice is like:
注意: 包含文件:include file path
The English equivalent is: Note: including file: .....
Perhaps ninja is stripping the include notices based on the English words only.