expenses commented 3 years ago

The time it takes to compile substrate is a persistent problem that we've been running into. This issue is for constructive discussions of the problem. It is not a channel for venting about long compilation times.

expenses commented 3 years ago

From @Robbepop:

As some of you do, I am having a C++ background and there compile times are really bad as well. It is pretty much the same severity and both languages have reasons for that. In C++ it mainly is the non-existence of a proper build system (and some language constructs) whereas in Rust a huge amount of the compilation time stems from the fact that there are so many analyses done during compilation. By now we kind of know what language constructs bloat the binary and compile time and I think it was about time that we put those into focus when developing for such big projects as Substrate. My idea for Substrate was to set up a long-term solution via CI so that we can monitor the compilation time (and maybe other statistics critical to developer productivity). The cool thing here is that whatever helps our users will ultimately also help us in the long run.

Since the Rust compiler already does this since some months and since we do have a DevOps team at Parity my thinking was that it should be possible for Parity to invest in such a CI based solution so that we can investigate for every PR or commit easily and cheaply how it affect compilation times. In doing so we will be able to quickly analyse those commits and PRs and over the long term we might implement rules allows us to reduce compilation times step by step.

This does not assert that we get there tomorrow but it asserts that we get there eventually.

expenses commented 3 years ago

from @pepyakin (more here: https://gist.github.com/expenses/3305c650805d81809b89ac70668eaa1c):

Re: removal native runtimes

One of the last straws that made me write that proposal is build time. Esp. building the native runtimes and also building the wasm runtimes. It is especially annoying when you are working with polkadot - cumulus pair. There you have to build lots of runtimes both native and wasm, even though you are working only with one (e.g. rococo).

Removing the native runtime would only ~halfen this time.

We could go further by decoupling the wasm build process from the client build process. That is, when you build either substrate node or polkadot node, no runtimes should be ideally compiled.

Some of you may say "wait, isn't this what we had in the beginning and that was awful?". Well, yes, that's it. But the problems back then were because of the potential mismatches between the native and wasm runtime that could go unnoticed. Now, without the native runtime we don't have this problem.

Theoretically, we could make a node only requiring a chianspec to run. That as I touched briefly in the proposal also would decouple development of the runtimes & clients.

adoerr commented 3 years ago

For tracking compile times, what would be considered the baseline hardware? In other words, what do we consider as recommended / required development hardware?

gilescope commented 3 years ago

We can probably use one of the git-lab ci tasks as a proxy compile benchmark. I'm guessing that most of the runners are the same hardware. Tbh, with ~35min compile times it's pretty clear if we've improved things.

xlc commented 3 years ago

For parachain teams, this is more important https://github.com/paritytech/cumulus/issues/446

Currently to do a clean build of collator, we need to build:

polkadot runtime
kusama runtime
Westend runtime
rococo runtime
parachain runtime

Everything natively and in wasm.

On Substrate side, remove native build should cut the work by half.

koute commented 3 years ago

This is definitely not a "proper" solution and we should definitely strive for faster compile times, however purely as a workaround just using better hardware helps a lot with the compile times. On my 32-core Threadripper 3970x compiling substrate from scratch in release mode takes (if I remember correctly) around ~4 minutes (I can benchmark it more precisely if anyone's interested). The 3970x costs only $2k USD, so you're probably looking at most at ~$4k USD for the whole computer. In a world where a reasonably specced Macbook costs over $2k this should be a no brainer for anyone who doesn't specifically need a laptop or macOS.

(Again, I'm not saying this is a solution, just a workaround for those who can go this way.)

pepyakin commented 3 years ago

Actually, if I am not mistaken, AMD 3900x can provide around 4m compilation time. It should be way cheaper as well.

BTW, see this. We may want to start a new round with a fixed nightly and updated substrate.

gilescope commented 3 years ago

Seems like high clock speeds can be more helpful than more cores. That tends to suggest that maybe we need to try and make more crates be processed in parallel. (maybe run with -Ztimings=json,html,info and have a look at the output)

(That it's hard to parallelise the rust compiler written in rust feels like a home goal to me - it should be a showcase of how to do multithreading / async but compilers are hard.)

On Thu, Jun 3, 2021 at 12:50 PM Sergei Shulepov @.***> wrote:

Actually, if I am not mistaken, AMD 3900x can provide around 4m compilation time. It should be way cheaper as well.

BTW, see this https://hackmd.io/p_eGqmZFTVKMg64fsIZo6Q. We may want to start a new round with a fixed nightly and updated substrate.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/paritytech/substrate/issues/8979#issuecomment-853809838, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGEJCH7FJQ5K7RT2C3SXDDTQ5UBFANCNFSM454W64AQ .

xlc commented 3 years ago

A lot of time is in linking phase, which cannot be done in parallel.

gilescope commented 3 years ago

@xlc linking on OSX or Linux? There's a few linkers out there and some are faster than others - lld I have heard is worth trying. There's faster ones coming E.g. MOLD and they certainly do some things in parallel (threads rather than processes). Not yet to my knowledge any serious ones done in rust yet otherwise I'd be sponsoring them!

xlc commented 3 years ago

I think it is slow in both. My guess is LTO is just slow.

expenses commented 3 years ago

On linux I've found running mold -run cargo (build|run) works well to speeding up linking.

expenses commented 3 years ago

re: LTO, you can disable that and it might speed up compilation up for dev builds, but I wouldn't recommend it for release builds.

expenses commented 3 years ago

Using a nightly compiler helps a lot as well. I tested this by running cargo check -p sc-service, adding a newline to primitives/transaction-pool/src/lib.rs and running cargo check -p sc-service again. This second check takes:

14.61s on stable (rustc 1.52.1 (9bc8c42bb 2021-05-09))
8.72s on nightly (rustc 1.55.0-nightly (f586d79d1 2021-06-13))

Xanewok commented 3 years ago

xlc linking on OSX or Linux? There's a few linkers out there and some are faster than others - lld I have heard is worth trying. There's faster ones coming E.g. MOLD and they certainly do some things in parallel (threads rather than processes). Not yet to my knowledge any serious ones done in rust yet otherwise I'd be sponsoring them!

I'd just like to echo that lld is worth looking into - IIRC it came up consistently almost about twice as fast on Linux when compiling Rust and should be production ready by now.

pepyakin commented 3 years ago

One direction we may also explore is to try to make debug builds work. Right now they don't because nodes built with no optimizations can't make it in the slot. I've tried to liberally sprinkle opt-level=3 in dev profiles in Cargo.toml and that lead nowhere. However, I am still optimistic that a node has not so many hotspots and we can pick them and set opt-level in their dev profiles.

paritytech / substrate

Compile time tracking issue #8979

Re: removal native runtimes