rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.73k stars 12.5k forks source link

cargo check is > O(n) per number of unchanged files #120647

Open ct-ln opened 7 months ago

ct-ln commented 7 months ago

Problem

When changing 1 whitespace in 1 file, "cargo check" is > O(n) per number of files.

Since both major IDEs are based on "cargo check", this sinks projects.

Steps to reproduce:

Rust project, where each file only calls functions in the "next file":

main.rs -> m0.rs -> m1.rs -> m2.rs -> m3.rs -> ....

Do "cargo check", "touch main.rs", "cargo check"

This time is > O(n) for n := number of files.

Unchanged Files Seconds
64 2.89
128 6
256 11.96
512 26.387
1024 166

Expected behavior:

In same scenario, e.g. C++ with "make" is O(1). Only one file has changed. Nothing depends on that file. Only one file needs to be looked at.

Also:

Rust compiler inhales 32GB of RAM doing a SYNTAX CHECK on a single 1kb source file

code


import random

FILE_COUNT = 1024
FUNC_COUNT = 50
CALLS_COUNT = 50

with open("main.rs", "w") as f:
    for file_idx in range(FILE_COUNT):
        f.write("mod m%d;\n" % file_idx)
    f.write("\n")
    f.write("fn main(){\n")
    f.write("    m0::m0_0();\n")
    f.write("}\n")

for file_idx in range(FILE_COUNT):
    last_file = file_idx == FILE_COUNT -1

    with open("m%d.rs" % file_idx,"w") as f:
        f.write("use rand::Rng;\n")
        if not last_file:
            f.write(f"use crate::m{file_idx+1};\n")

        for func_idx in range(FUNC_COUNT):
            f.write("pub fn m%d_%d(){\n" % (file_idx,func_idx))
            f.write("let mut rng = rand::thread_rng();\n")
            if last_file:
                for call_idx in range(CALLS_COUNT):
                    f.write('if rng.gen::<u8>() == 0 { println!("hello!"); }\n')
            else:
                for call_idx in range(CALLS_COUNT):
                    f.write("if rng.gen::<u8>() == 0 { m%d::m%d_%d();}\n" % (file_idx+1,file_idx+1,random.randint(0,FUNC_COUNT-1)))
            f.write("}\n")
bjorn3 commented 7 months ago

This is expected. In Rust the compilation unit is a crate, while in C and C++ it is a single file. Rustc does have incremental compilation to reuse many unchanged computations from previous compilations like typeck and borrowck for unchanged functions, but for codegen it has to recompile an entire codegen unit at a time. Rustc creates a codegen unit for every module and then merges codegen units together until the total count doesn't exceed a given maximum (256 for incr comp enabled, 16 for incr comp disabled by default, can be changed using -Ccodegen-units) to work around the fact that LLVM and linkers can't deal with many small object files as well as a smaller amount of larger object files. As for why it looks to be slower than linearly scaling, I suspect the fact that you keep nesting modules within each other rather than having a mostly flat module structure has to do something about it. It forces the mangled symbol names to get longer and longer. There may also be some quadratic behavior around deeply nested modules in rustc.

workingjubilee commented 7 months ago

As for why it looks to be slower than linearly scaling, I suspect the fact that you keep nesting modules within each other rather than having a mostly flat module structure has to do something about it. It forces the mangled symbol names to get longer and longer. There may also be some quadratic behavior around deeply nested modules in rustc.

Yeah, I think we reasonably expect an ability to get it down to O(n log n) but not necessarily more? The 1024 is approximately 3.46 times the linear extrapolation from... deliberately rounding... "64 => 3 seconds"... a scaling of 16 turned into a scaling of ~55.3 or so. Large but this probably needs investigation to verify if it is even fixable.

arjun-menon commented 1 month ago

Sorry, but just wondering: could we have a "dev mode" where there is an unbounded number of objects files--would the link step still take too long? Also, I came across this thread that says Rust uses C/C++ linker--so should there be a difference in link time for Rust, in comparison to C++? On a side note, link time being O(n) makes sense, I wonder if OP meant that C++ is O(1) for compiling after a single file (and not linking as well).

workingjubilee commented 1 month ago

Also, I came across this thread that says Rust uses C/C++ linker--so should there be a difference in link time for Rust, in comparison to C++?

Just because they both use object code doesn't mean they link the same kinds of object code in the same ways, so yes, there will be differences.

workingjubilee commented 1 month ago

@arjun-menon As far as I can tell, nothing you have said is related to the original issue, because cargo check does not generate object code. Please open a new issue if you wish to report.