xmake-io / xmake

🔥 A cross-platform build utility based on Lua
https://xmake.io
Apache License 2.0
10.12k stars 784 forks source link

构建前检测耗时异常 #4613

Closed SaeruHikari closed 10 months ago

SaeruHikari commented 10 months ago

Xmake 版本

2.8.6

操作系统版本和架构

Windows11 22H2

描述问题

最近使用中偶尔会出现构建文件前准备阶段耗时长(10 秒以上)的问题,尤其是对脚本做了修改后(没有实质修改只有空保存也是如此)。verbose 输出可以看到输出了大量 checking for flags (...) ... ok,反查内置脚本发现在执行 has_flags 操作。不知道这个问题可能是因为什么触发的,怎么解决或者修复呢?

期待的结果

在每次构建耗时不会耗时过长。

工程配置

SakuraEngine 主分支

附加信息和错误日志

checking for flags (cl_external_includedir) ... ok
checking for flags (cl_external_includedir) ... ok
checking for flags (-Wno-deprecated-builtins) ... ok
checking for flags (-Wno-deprecated-builtins) ... ok
checking for flags (-Wno-deprecated-builtins) ... ok
checking for flags (-Wno-deprecated-builtins) ... ok
checking for flags (-Wno-deprecated-builtins) ... ok
checking for flags (-Wno-deprecated-builtins) ... ok
checking for flags (-Wno-pragma-system-header-outside-header) ... ok
checking for flags (-Wno-pragma-system-header-outside-header) ... ok
checking for flags (-Wno-pragma-system-header-outside-header) ... ok
checking for flags (-Wno-pragma-system-header-outside-header) ... ok
checking for flags (-Wno-pragma-system-header-outside-header) ... ok
checking for flags (-Wno-pragma-system-header-outside-header) ... ok
checking for flags (-Wno-ambiguous-reversed-operator) ... ok
checking for flags (-Wno-ambiguous-reversed-operator) ... ok
checking for flags (-Wno-ambiguous-reversed-operator) ... ok
checking for flags (-Wno-ambiguous-reversed-operator) ... ok
checking for flags (-Wno-ambiguous-reversed-operator) ... ok
checking for flags (-Wno-ambiguous-reversed-operator) ... ok
checking for flags (-Wno-unused-command-line-argument) ... ok
checking for flags (-Wno-unused-command-line-argument) ... ok
checking for flags (-Wno-unused-command-line-argument) ... ok
checking for flags (-Wno-unused-command-line-argument) ... ok
checking for flags (-Wno-unused-command-line-argument) ... ok
checking for flags (-Wno-unused-command-line-argument) ... ok
checking for flags (-Wno-format) ... ok
checking for flags (-Wno-format) ... ok
checking for flags (-Wno-format) ... ok
checking for flags (-Wno-format) ... ok
checking for flags (-Wno-format) ... ok
checking for flags (-Wno-format) ... ok
checking for flags (-Wno-switch) ... ok
checking for flags (-Wno-switch) ... ok
checking for flags (-Wno-switch) ... ok
checking for flags (-Wno-switch) ... ok
checking for flags (-Wno-switch) ... ok
checking for flags (-Wno-switch) ... ok
checking for flags (-Wno-misleading-indentation) ... ok
checking for flags (-Wno-misleading-indentation) ... ok
checking for flags (-Wno-misleading-indentation) ... ok
checking for flags (-Wno-misleading-indentation) ... ok
checking for flags (-Wno-misleading-indentation) ... ok
checking for flags (-Wno-misleading-indentation) ... ok
checking for flags (-Wno-unknown-pragmas) ... ok
checking for flags (-Wno-unknown-pragmas) ... ok
checking for flags (-Wno-unknown-pragmas) ... ok
checking for flags (-Wno-unknown-pragmas) ... ok
checking for flags (-Wno-unknown-pragmas) ... ok
checking for flags (-Wno-unknown-pragmas) ... ok
checking for flags (-Wno-unused-function) ... ok
checking for flags (-Wno-unused-function) ... ok
checking for flags (-Wno-unused-function) ... ok
checking for flags (-Wno-unused-function) ... ok
checking for flags (-Wno-unused-function) ... ok
checking for flags (-Wno-unused-function) ... ok
checking for flags (-Wno-ignored-attributes) ... ok
checking for flags (-Wno-ignored-attributes) ... ok
checking for flags (-Wno-ignored-attributes) ... ok
checking for flags (-Wno-ignored-attributes) ... ok
checking for flags (-Wno-ignored-attributes) ... ok
...
Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


Title: Detect time-consuming exceptions before building

SaeruHikari commented 10 months ago

4429 看上去和这个类似,我尝试调试一下。

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


4429 looks similar to this, I will try to debug it.

SaeruHikari commented 10 months ago

我简单调试了一下,卡顿集中在已经标记 profile 的运行探测程序的区间内。

    -- start profile
    profiler.enter("has_flags", tool.name, checkflags[1])

    -- detect.tools.xxx.has_flags(flags, opt)?
    _g._checking = coroutine_running and key or nil
    local hasflags = import("detect.tools." .. tool.name .. ".has_flags", {try = true})
    local errors = nil
    if hasflags then
        result, errors = hasflags(checkflags, opt)
    else
        result = try { function () os.runv(tool.program, checkflags, {envs = opt.envs}); return true end, catch { function (errs) errors = errs end }}
    end
    if opt.on_check then
        result, errors = opt.on_check(result, errors)
    end
    _g._checking = nil
    result = result or false

    -- stop profile
    profiler.leave("has_flags", tool.name, checkflags[1])

删除下面运行进程检查的部分,就不会卡顿。并且同一个 key 在同一时间段内,重复查询了特别多次,也会运行超级多次,感觉很异常。并且这些运行现场也不是并发的,任务管理器中能看到只有四五个编译器实例在进行查询。是不是需要保证同一个 key 下有一个 once_gurad?

SaeruHikari commented 10 months ago

理了一下现在的逻辑,应该是有 co_yield 在尝试避免同时启动多个实例,这部分是有效的没有问题,解注释后进程会大量增多。主要问题还是对重复 key 的命中检测并没有生效,每次重复的检查最终都会转发到 tooldetect。我看到 tool 里面只有一层 arglist parse 的快速查询,除此之外会直接调用到进程,所以我在这里加了个打印: image

结果重复的 key 会串行地重复运行我的 clang:

checking for flags (-Wno-ignored-attributes) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-deprecated-declarationslaunch
clang run
checking for flags (-Wno-deprecated-declarations) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-nullability-completenesslaunch
clang run
checking for flags (-Wno-ignored-attributes) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-nullability-completenesslaunch
clang run
checking for flags (-Wno-deprecated-declarations) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-nullability-completenesslaunch
clang run
checking for flags (-Wno-deprecated-declarations) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-nullability-completenesslaunch
clang run
checking for flags (-Wno-deprecated-declarations) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-nullability-completenesslaunch
clang run
checking for flags (-Wno-deprecated-declarations) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-nullability-completenesslaunch
clang run
checking for flags (-Wno-nullability-completeness) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-tautological-undefined-comparelaunch
clang run
checking for flags (-Wno-nullability-completeness) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-tautological-undefined-comparelaunch
clang run
checking for flags (-Wno-nullability-completeness) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-tautological-undefined-comparelaunch
clang run
checking for flags (-Wno-nullability-completeness) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-tautological-undefined-comparelaunch
clang run
checking for flags (-Wno-nullability-completeness) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-tautological-undefined-comparelaunch
clang run
checking for flags (-Wno-nullability-completeness) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Wno-tautological-undefined-comparelaunch
clang run
checking for flags (-Wno-tautological-undefined-compare) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Werror=return-typelaunch
clang run
checking for flags (-Wno-tautological-undefined-compare) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Werror=return-typelaunch
clang run
checking for flags (-Wno-tautological-undefined-compare) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Werror=return-typelaunch
clang run
checking for flags (-Wno-tautological-undefined-compare) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Werror=return-typelaunch
clang run
checking for flags (-Wno-tautological-undefined-compare) ... ok
windows_x64_C:\Program Files\LLVM\bin\clang-cl.exe_17.0.5_cxx_cxflags__-Werror=return-typelaunch
clang run

我觉得 #4429 的卡死也是同样的问题,因为同时发射大量进程,这个检查过程有的时候速度不慢,有的时候巨慢无比。慢的时候需要检查接近 20 秒,表现就跟卡死了一样: S~4{HN79Y{ 2~H%3U7)ZG8E

waruqi commented 10 months ago

这种通常是最开头的并行编译导致,因为一开始编译, flags 的检测结果都还没检测完成和缓存住。。。然后突然执行 xmake 编译,如果你这 cpu core 多,默认一下子 8-12 个 并行编译任务起来。。在编译过程中,一些 flags 需要做检测,但是又还没被缓存住,就是在这 12 个并行任务重,同时开启 flags 检测。。这通常发生在最开头的那十几个源文件编译时候。

目前还没有更好的办法,现在是通过 has_flags 开头判断是否存在相同的 check flags 任务,然后等待规避

https://github.com/xmake-io/xmake/blob/45f0750451087954ca4d649192d593341dd809f9/xmake/modules/lib/detect/has_flags.lua#L87

但是这个 yield 目前好像也会有一些其他问题,但是我这边暂时无法复现。。

由于目前所有 flags 都是跟随实际的编译任何延迟检测的,这样可以规避掉大量无用的 flags 检测,但是就会有这个并发冗余检测问题。。如果全部串行化提前至编译之前处理,一个是也会影响并行编译的效率,另外实现上 也不太好剥离到开头处理,并且也没法有效区分哪些flags 实际会被用到。。

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


This is usually caused by the initial parallel compilation, because at the beginning of compilation, the detection results of flags have not been completed and cached. . . Then suddenly execute xmake compilation. If you have a lot of CPU cores, 8-12 parallel compilation tasks will be started by default. . During the compilation process, some flags need to be checked, but they have not been cached yet. That is, the flags detection is turned on during these 12 parallel tasks. . This usually happens when the first dozen source files are compiled.

There is no better way at the moment. The current method is to judge whether the same check flags task exists through the beginning of has_flags, and then wait for evasion.

https://github.com/xmake-io/xmake/blob/45f0750451087954ca4d649192d593341dd809f9/xmake/modules/lib/detect/has_flags.lua#L87

However, this yield seems to have some other problems at present, but I can't reproduce them for the time being. .

Since all flags are currently checked with any delay in the actual compilation, this can avoid a large number of useless flags checks, but there will be this concurrent redundant detection problem. . If all serialization is processed before compilation, one is that it will also affect the efficiency of parallel compilation. In addition, it is not easy to strip the processing to the beginning in terms of implementation, and it is impossible to effectively distinguish which flags will actually be used. .

SaeruHikari commented 10 months ago

对,感觉就是去重了但是只有一部分,其实还是有一批相同 key 的检测进程同时启动了,这批就是不在一个 co 里并且时序上竞争的任务。所以这个有好的办法规避吗,比如我在脚本可选地跳过一部分这个检查,或者发起主动检查。目标逐渐增多了,耗时直接 N * M 级别暴增,有点受不住。

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


Yes, it feels like the duplication has been removed but only part of it. In fact, there are still a batch of detection processes with the same key started at the same time. This batch is a task that is not in the same co and competes in timing. So is there a good way to avoid this? For example, I can optionally skip part of this check in the script, or initiate an active check. The number of targets gradually increased, and the time consuming increased to N*M levels, which was a bit unbearable.

SaeruHikari commented 10 months ago

然后最困扰的一点其实是,windows 启动进程的速度真的太不稳定。有些时候虽然检查了这么多次,但只耗几秒,但有些时候却能花费二十几秒。真的挺莫名其妙的。

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


The most troublesome thing is that the speed of the Windows startup process is really unstable. Sometimes, although it is checked so many times, it only takes a few seconds, but sometimes it can take more than twenty seconds. It's really baffling.

waruqi commented 10 months ago

可以全局配置这个试试,禁用 flags 检测,但只能禁用用户自己 add_ldflags/add_cxflags 配置的 flags

一些内置必须的 flags 检测是规避不掉的。。

set_policy("check.auto_ignore_flags", false)

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


You can try configuring this globally to disable flags detection, but you can only disable the flags configured by the user's own add_ldflags/add_cxflags

Some built-in required flags detection cannot be circumvented. .

set_policy("check.auto_ignore_flags", false)

waruqi commented 10 months ago

对,感觉就是去重了但是只有一部分,其实还是有一批相同 key 的检测进程同时启动了,这批就是不在一个 co 里并且时序上竞争的任务。所以这个有好的办法规避吗,比如我在脚本可选地跳过一部分这个检查,或者发起主动检查。目标逐渐增多了,耗时直接 N * M 级别暴增,有点受不住。

目前也没啥好办法,需要复现后,花时间调,暂时也没太多时间看这个

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


Yes, it feels like the duplication has been removed but only part of it. In fact, there are still a batch of detection processes with the same key started at the same time. This batch is a task that is not in the same co and competes in timing. So is there a good way to avoid this? For example, I can optionally skip part of this check in the script, or initiate an active check. The number of targets gradually increased, and the time consuming increased to N*M levels, which was a bit unbearable.

There is no good solution at the moment. I need to spend time adjusting it after reproducing it. I don’t have much time to look at this at the moment.

waruqi commented 10 months ago

另外,你 add_cxflags("xxx", {force = true}) 通过 force 添加,也不会走 check

SaeruHikari commented 10 months ago

没问题,这样暂时就能解决了。我那台机器环境应该也有点问题,来公司的机器起的进程虽然也多,就没那么慢。

Issues-translate-bot commented 10 months ago

Bot detected the issue body's language is not English, translate it automatically.


No problem, this will solve it for now. There must be some problems with the environment of my machine. Although the machine coming to the company starts many processes, it is not that slow.