ipetkov / crane

A Nix library for building cargo projects. Never build twice thanks to incremental artifact caching.
https://crane.dev
MIT License
961 stars 92 forks source link

Parallelize removeReferencesToVendoredSources #621

Closed DoctorDalek1963 closed 6 months ago

DoctorDalek1963 commented 6 months ago

Motivation

Closes #620 by using GNU parallel to remove references to vendored sources in multiple files at once.

Checklist

DoctorDalek1963 commented 6 months ago

It's worth noting that this change can make builds non-reproducible. In my case, the only non-reproducible bit is a list of JS files in the index.html. It's the same list, but the elements are in different orders.

Non-reproducibility can be really bad for some builds, so this change should probably be opt-in. That would mean the behavior of the hook should change based on something in mkCargoDerivation. What's the best way of doing that?

dpc commented 6 months ago

Why would it be non-repro? The end result should always be the same, no?

DoctorDalek1963 commented 6 months ago

Oops, turns out this was just the project I was testing with. It seems like the non-reproducibility is because of the stdweb crate and nothing to do with this PR. I've since tested this PR with a different project and the result was reproducible both with and without this parallelism.

dpc commented 6 months ago

It seems like the non-reproducibility is because of the stdweb crate and nothing to do with this PR

You think you can follow through with them to address it? Would be great if Rust ecosystem was aware that byte-to-byte reproducibility is important to many people. :)

DoctorDalek1963 commented 6 months ago

I'd love to, but stdweb has received no new commits in over 5 years, and all the PRs are totally inactive, so it's probably better for anyone using it to switch to something different.

antifuchs commented 6 months ago

Just chiming in to mention it's not just stdweb that has this issue with trunk builds: My assets/ dir containing some 100 font and png files gets included in the sources for the build, which causes the build time to balloon. (And I'm using the example for a trunk workspace build!)

Very glad to see you have already started that issue report and pull request, otherwise I'd have to do it ((:

dpc commented 6 months ago

Just chiming in to mention it's not just stdweb that has this issue with trunk builds: My assets/ dir containing some 100 font and png files gets included

Would it make sense to have some hook/filter attribute to disable for bulk stuff that people sometimes have and don't care if it has references, or it possibly can't and there's no point of tryint go fix it up?

antifuchs commented 6 months ago

Would it make sense to have some hook/filter attribute to disable for bulk stuff[...]?

Yes, I was hoping to find a setting for that! But running it in parallel would already be a huge improvement: I think a setting would be a good next step after parallelization lands.

ipetkov commented 6 months ago

Thanks again for the PR @DoctorDalek1963 ! I'm going to close this in favor of https://github.com/ipetkov/crane/pull/625 which is able to achieve a really good speedup here without needing to use parallel (more context can be found here: https://github.com/ipetkov/crane/issues/620#issuecomment-2131570054 )

I'm going to close this for now, but feel free to reopen if you think there are other improvements to be had!