ninja-build / ninja

a small build system with a focus on speed
https://ninja-build.org/
Apache License 2.0
11.2k stars 1.59k forks source link

Tons of include file notices when building with Chinese version of Visual Studio 2010 #613

Closed sfcheng closed 6 months ago

sfcheng commented 11 years ago

When building chromium with ninja with Chinese version of Visual Studio 2010, the console window outputs tons of include file notices and thus greatly slows down the building process. The notice is like:

注意: 包含文件:include file path

The English equivalent is: Note: including file: .....

Perhaps ninja is stripping the include notices based on the English words only.

sgraham commented 11 years ago

On an English install "Note: including file:%s%s\n" appears as resource in the string table in VC\bin\1033\clui.dll.

1033 is the locale identifier for "English (United States)".

I'm not aware of any command line option to force cl into 1033 locale unfortunately. I assume if multiple locales are installed it uses system settings to determine which to use.

So, I guess we'd have to have add various languages to the prefix search in the /showIncludes parser. :/

evmar commented 11 years ago

I think cmcldeps (the CMake parser of this output) uses a regex, something like "[^:]+: [^:]+: (.*)", to grab all output lines that look like showincludes output. I haven't looked at the code too hard because I'd eventually like to implement something like that and I don't want to violate any copyrights. :)

The tricky part is not confusing showincludes output with warnings. sfcheng, could you paste the Chinese output of Visual Studio cl.exe when showing a warning or error message?

sgraham commented 11 years ago

It looks like that : is not ascii 58, so that might add a wrinkle. Maybe the line number "(\d+)" that will be in errors could be useful signal.

okuoku commented 11 years ago

I'm not aware of any command line option to force cl into 1033 locale unfortunately.

There is no (clean) way to do this, unfortunately. Foreign language versions of VS would have other locale resources in different number(e.g. 1041 for JA). What we learned: always install EN version of VS, then install language pack if needed :(

But fortunately, "error Cnnnn" and "warning Cnnnn" never get localized. So we can use them as key. But as @sgraham said, line numbers looks like more promising because it would also allow filtering out 'note:' output.

I'm not sure whether or not : are not ascii 58. In Japanese version, these are certainly ascii 58.

FWIW, Japanese output would look like this:

C:\cygwin\home\oku>type main.c
#include <stdio.h>
int nah(void){}; /* Trigger "function must return a value */
main(){return nah();}

C:\cygwin\home\oku>cl /showIncludes main.c
Microsoft(R) C/C++ Optimizing Compiler Version 16.00.40219.01 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.c
メモ: インクルード ファイル:  C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\stdio.h
メモ: インクルード ファイル:   C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\crtdefs.h
メモ: インクルード ファイル:    C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\sal.h
メモ: インクルード ファイル:     c:\program files (x86)\microsoft visual studio10.0\vc\include\codeanalysis\sourceannotations.h
メモ: インクルード ファイル:    C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\vadefs.h
メモ: インクルード ファイル:   C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\INCLUDE\swprintf.inl
c:\cygwin\home\oku\main.c(2) : warning C4716: 'nah' : 値を返さなければいけません
nico commented 11 years ago

Related art: https://bugzilla.mozilla.org/show_bug.cgi?id=587372 (approach there: read prefix from an env var, have an autoconf check to verify /showIncludes parsing works. Not great.)

nico commented 11 years ago

Idea similar to the mozilla bug: There could be a toplevel var msvs_includes_prefix, and generators could write that by compiling a dummy #include "knownheader.h" file with /showIncludes and writing whatever is in front of "knownheader.h" in the cl.exe output into that toplevel var. Ninja would then use msvs_includes_prefix as the /showIncludes prefix.

syntheticpp commented 11 years ago

While CMake configurures, the include prefix is read from one dummy build, https://github.com/Kitware/CMake/blob/master/Modules/CMakeClDeps.cmake where a regex is used, later on the prefix is passed as argument to the tool, https://github.com/Kitware/CMake/blob/master/Source/cmcldeps.cxx and std::string::substr is used for processing the cl.exe output.

I assume ninja also needs to accept an additional (global) argument. (like nico suggested)

CMake also uses cmcldeps to generate dependencies for .rc files by first "compiling" the .rc file with cl which generates the dependency file, and then processes it with the rc tool. Not sure if or how this could be integrated into ninja.

syntheticpp commented 11 years ago

https://github.com/martine/ninja/pull/665

Does this work with non-Ascii prefixes?

nico commented 11 years ago

665 is merged. We might have to iterate on encoding issues a bit still though, so leaving this open until this is verified working.

fbridault commented 6 years ago

With a french local, ninja 1.8.2 and CMake 3.10.2, this still happens...

nico commented 6 years ago

665 added msvc_deps_prefix. Does cmake set that? @syntheticpp @mathstuf

mathstuf commented 6 years ago

@nico I see code in CMake which references it.

dtaralla commented 5 years ago

Still happening on Visual Studio Community 15.9.7...

DrFrankenstein commented 5 years ago

For the record, this is still happening to me with the following configuration: CMake 3.14 Ninja 1.8.2 (the one that ships with Visual Studio 2019) French locale.

EDIT: Better workaround: set VSLANG=1033 in the environment to force CL to output English messages.

Old workaround: And for those who also hit this issue, my workaround was to comment out the following line in $CMAKE_PATH\share\cmake-3.14\Modules\Platform\Windows-MSVC.cmake: #set(CMAKE_NINJA_DEPTYPE_${lang} msvc) (line 368 in mine)

This unfortunately causes CMake to generate deps = gcc instead of juste removing the deps line, but that hasn't seemed to break my builds. YMMV. This is a workaround.

mathstuf commented 5 years ago

Setting deps = gcc is probably benign without setting depfile as well.

jonesmz commented 4 years ago

@DrFrankenstein Would you be willing to try reproducing this after applying this PR to the ninja codebase? https://github.com/ninja-build/ninja/pull/1671

DrFrankenstein commented 4 years ago

I'll give it a shot this week!

DrFrankenstein commented 4 years ago

Unfortunately, that didn't fix it.

I built ninja from that branch, then used that version of ninja to build itself again, and it still leaked the include messages to the terminal. image

jonesmz commented 4 years ago

It seems to me like the issue here might be with the MSVC include handler.

Is the grammar for that not recognizing the output from cl.exe correctly?

jonesmz commented 4 years ago

Well...

It looks like it's a problem with hardcoded english.

https://github.com/ninja-build/ninja/blob/master/src/clparser.cc

string CLParser::FilterShowIncludes(const string& line,
                                    const string& deps_prefix) {
  const string kDepsPrefixEnglish = "Note: including file: ";
  const char* in = line.c_str();
  const char* end = in + line.size();
  const string& prefix = deps_prefix.empty() ? kDepsPrefixEnglish : deps_prefix;
  if (end - in > (int)prefix.size() &&
      memcmp(in, prefix.c_str(), (int)prefix.size()) == 0) {
    in += prefix.size();
    while (*in == ' ')
      ++in;
    return line.substr(in - line.c_str());
  }
  return "";
}

@DrFrankenstein

Do you feel like messing with that english prefix at the top of the function to see if it does you any better?

DrFrankenstein commented 4 years ago

Ha! I was actually going to look in there tomorrow. Seems like you caught it before I had a chance to.

I just shut my computer down for the night. I'll get back to you tomorrow!

DrFrankenstein commented 4 years ago

It's worth noting, though, deps_prefix should contain the localized string as set in the rules.ninja file (usually detected and set by CMake). It only uses the hard-coded one if it's absent.

I suspect the logic just after it might be the actual culprit. But as I said, I'll do a proper investigation/debugging session tomorrow.

DrFrankenstein commented 4 years ago

The encodings don't match. deps_prefix is in Latin-1 (where the NBSP before the colons is 0xA0), and line is in CP437 for some reason (NBSP = 0xFF). image

I think that CL itself is outputting CP437, but the CMake-generated rules.ninja is in Latin-1. I'm guessing that some conversion occured on the CMake side, but that'll require more digging.

EDIT: It seems like CL will output in whatever the console's codepage is. (Source 1, Source 2). I'm not sure how we can force it to be something else.

Perhaps we can bring the two together by converting them both to a common encoding such as UTF-8 (or whatever Ninja prefers to use), e.g. by calling MultiByteToWideChar(CP_OEMCP, ...) on the CL output, and MultiByteToWideChar(1252, ...) on the string that comes from rules.ninja.

DrFrankenstein commented 4 years ago

Thinking back on this... this might be CMake's fault. On Windows, the execute_process command seems to convert the output of the command to UTF-8 internally (and accepts an optional ENCODING parameter to specify the output's encoding). It thus writes it back in UTF-8 in the rules.ninja file (where NBSP is 0xA0 and not 0xFF).

I tried changing CMAKE_DETERMINE_MSVC_SHOWINCLUDES_PREFIX to use ENCODING NONE (perform no conversion), but it seemed to break all sorts of things in CMake.

So the question I'm having now is... should ninja's msvc_deps_prefix be a bitwise match of the compiler's output, or should it be in whatever encoding the file is expected to be, in which case it would be Ninja's job to do the proper conversions from the compiler output?

mathstuf commented 4 years ago

@bradking Thoughts on the encoding and prefix detection here?

evmar commented 4 years ago

Historically ninja has been encoding agnostic (as long as the encoding used the same byte as ASCII for '/'). However, Windows might make that difficult.

bradking commented 4 years ago

Ninja's CLParser::FilterShowIncludes is using memcmp to compare msvc_deps_prefix to lines in MSVC's output so indeed the value needs to be a bitwise match. CMake may need some work to preserve that. CMake currently converts to UTF-8 internally so perhaps all that is missing is converting back to the codepage's encoding when writing the value to build.ninja.

IIRC, MSVC's output encoding can be affected by environment variables and/or flags. That means we may end up with the compiler output in a different encoding than the codepage in which Ninja is operating and using to interpret strings in build.ninja. Such cases may require extra support from Ninja to handle but further investigation would be needed.

DrFrankenstein commented 4 years ago

I couldn't find any environment variable affecting the codepage used by CL. I think it just uses the codepage associated with the process (which is based on the system's regional settings, or by the console settings if the process is running in one).

However, there is an environment variable that sets the language used by CL, VSLANG, which can be useful as a workaround for users affected by this bug. Setting VSLANG=1033 before generating the ninja files will prevent the bug from happening.

evmar commented 4 years ago

Just to restate my above comment in different words: Ninja treats its input files as (encodingless) bytes, and does encoding-ignorant byte comparisons of strings, to attempt to evade these issues. You need the bytes that appear in the build.ninja file to match the bytes that ninja reads from the process stdout, but ninja doesn't care about encodings.

Krantz-XRF commented 4 years ago

After CMake generating all the build files, I manually converted rules.ninja to UTF-8 which contains a line msvc_deps_prefix = 注意: 包含文件:, and then things got fixed. (That file used to be in GB2312 encoding, which correspond to the default code page 936.) I guess changes could be done to CMake so that it always converts rules.ninja to UTF-8?

I have no experience working on locales other than code page 936 or 65001, so I have no idea whether the solution above is a universal fix, though.

YannKer commented 4 years ago

Same problem and manage to erase this output with add /W2 instead of /W3 in CMAKE_CXX_FLAGS

benmcmorran commented 4 years ago

This is related to #1766

Krysl commented 3 years ago

as sgraham said:

On an English install "Note: including file:%s%s\n" appears as resource in the string table in VC\bin\1033\clui.dll.

I write a script find_msvc_prefix.py to check all clui.dll on my pc, it seems like all "Note: including file:%s%s\n" are at similar place string table/26:1033/408

Plan A'

VS2019 have only 14 language packs. We can store them in the source code.

I found Baze already support this.(see Windows, MSVC: Support parsing /showIncludes in 14 languages #7966, ShowIncludesFilter.java`)

I combine the two table("Note: including file:" LCID (Locale ID)), and made this table: Lang ID Description String template BCP 47 Code
1028 Chinese - Taiwan 注意: 包含檔案:%s%s\n zh-TW
1029 Czech Poznámka: Včetně souboru:%s%s\n cs-CZ
1031 German - Germany Hinweis: Einlesen der Datei:%s%s\n de-DE
1033 English - United States Note: including file:%s%s\n en-US
1036 French - France Remarque : inclusion du fichier : %s%s\n fr-FR
1040 Italian - Italy Nota: file incluso %s%s\n it-IT
1041 Japanese メモ: インクルード ファイル: %s%s\n ja-JP
1042 Korean 참고: 포함 파일:%s%s\n ko-KR
1045 Polish Uwaga: w tym pliku: %s%s\n pl-PL
1046 Portuguese - Brazil Observação: incluindo arquivo:%s%s\n pt-BR
1049 Russian Примечание: включение файла: %s%s\n ru-RU
1055 Turkish Not: eklenen dosya: %s%s\n tr-TR
2052 Chinese - People's Republic of China 注意: 包含文件: %s%s\n zh-CN
3082 Spanish - Spain (Modern Sort) Nota: inclusión del archivo:%s%s\n es-ES

we can rearrange the 14 lang detection order by:

Plan B

Plan C

Patch all the clui.dll, make all the "Note: including file:" in various languages to English version.

Advantage:

Disadvantage:

MadHacker666 commented 3 years ago

now. msvc build tools 2019. locale ru-RU. Clion (cmake 3.21, ninja 1.10.2).

Примечание: включение файла:    C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\condition_variable
Примечание: включение файла:     C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\yvals_core.h
Примечание: включение файла:      C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\vcruntime.h
Примечание: включение файла:       C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30037\include\sal.h

There is a solution?

BeardOfShadows commented 3 years ago

The workaround I use for now: In console set VSLANG=1033 in CMakeLists.txt add

if (CMAKE_GENERATOR MATCHES "Ninja")  
    set(CMAKE_CL_SHOWINCLUDES_PREFIX "Note: including file:")  
endif()
amasciotta commented 2 years ago

Similar issue here: I've got Windows localized in italian, and VS2019 in english. Ninja outputs the italian "Nota: file incluso" string. Setting VSLANG=1033 in the console before calling vcvars32/64 solved the issue.

evilenzo commented 2 years ago

Similar issue on last Visual Studio Preview 2022 17.2 with ninja 1.10.2 in CLion 2021.3. Tons of "Note including file: foo.h" even without /showIncludes option set. So annoying

cpsauer commented 2 years ago

Hey all! Came across this while writing some Bazel tooling. After some experimenting, we ended up matching all the languages rather than using VSLANG=1033, because the latter doesn't work if the user doesn't have the English language pack installed. (Seems to work otherwise, though.)

GuoZiyangTYUST commented 2 years ago

A solution not very good: use Clang rather than MSVC

such "Note include" output by MSVC tool chain,just change it.

用了clang就不能用.nativs了。。。。。。 改了一下源码,提供了二进制版本 https://github.com/GuoZiyangTYUST/ninja/releases/tag/%E5%8E%BB%E9%99%A4%E5%86%97%E4%BD%99%E8%BE%93%E5%87%BA

hez2010 commented 2 years ago

Hit this issue as well, so we just switched to msbuild from ninja and the issue was gone. Unexpectedly found that msbuild has been far faster than before :D

kangjianbin commented 2 years ago

I guess the bug is in cmake.

CMake can detect the prefix correctly, and stores it in its variable 'CMAKE_CL_SHOWINCLUDES_PREFIX' (or CMAKE_CXX_CL_SHOWINCLUDES_PREFIX). Note that this prefix is not utf-8 encoding. For example, in my locale it is encoded in 'GB18030'. However, when creating rules.ninja, cmake converts this prefix to utf-8 and saves the converted data to msvc_deps_prefix.

During build, msvc_deps_prefix (utf-8 encoding) cannot match MSVC's output (gb18030 enconding, et al), so Ninja reports lots of include files.

I switched to cmake-3.20, which doesn't convert the prefix to utf-8, and everything works well.

mathstuf commented 2 years ago

I switched to cmake-3.20, which doesn't convert the prefix to utf-8, and everything works well.

So this can be closed then?

bradking commented 2 years ago

A relevant change in CMake is MR 5089, first included in CMake 3.19. It switched to writing the msvc_deps_prefix as a raw byte sequence regardless of the encoding of build.ninja, since Ninja only compares raw byte sequences to look for such lines in the compiler output.

The best path forward for Ninja upstream is #1812/#2200, which add support for using cl's new equivalent to a gcc depfile.

kangjianbin commented 2 years ago

I switched to cmake-3.20, which doesn't convert the prefix to utf-8, and everything works well.

So this can be closed then?

No as it still exists in CMake 3.24. I checked MR 5089, looks like the issue was introduced again in MR 5860 . When Ninja version is larger than 1.11, CMake assumes Ninja supports UTF-8; so it writes rules.ninja with UTF-8.

bradking commented 2 years ago

@kangjianbin thanks. Please open a CMake upstream issue for that. Include details about the actual encoding of the byte sequence that you're observing.

kangjianbin commented 2 years ago

Created issue https://gitlab.kitware.com/cmake/cmake/-/issues/24068

bradking commented 1 year ago

Please try CMake 3.25.

kangjianbin commented 1 year ago

CMake 3.25 works for me. Thanks.

fcharlie commented 1 year ago

When I update cmake to 3.25 this issue doesn't seem to be resolved.

cmake: 3.25.0
ninja: 1.11.1
chcp: 936
image

Whether you run cmake --build . Or ninja all, the result is the same. Also, modifying the code page to 54936 has no effect.

Enable Beta: Use Unicode UTF-8 for worldwide language support and restart Windows:

image

I suggest that cmake/ninja should proactively avoid annoying problems caused by MSVC's multilingual mechanism if it wants to read this information from the standard output of 'cl.exe'.

kangjianbin commented 1 year ago

Enable Beta: Use Unicode UTF-8 for worldwide language support and restart Windows:

Did cmake detect the prefix correctly?