rust-lang / backtrace-rs

Backtraces in Rust
https://docs.rs/backtrace
Other
524 stars 240 forks source link

Modularise CI workflow and validate outputs for binary size checks. #549

Closed detly closed 11 months ago

detly commented 1 year ago

I was amazed at how quickly you spun up your code size check CI workflow. It prompted me to take that approach and play around with it in a private repo to dig into it more, which resulted in these changes and notes you might be interested in.

Notes:

Modularisation

Most of the workflow structure changes were just to help me understand the process. I found it useful to separate the step that does the code size check itself from all the prep work. This same separation lent itself well to eg. converting the workflow into a Docker multi-stage image so I could automate running the measurement over a lot of commits in the history.

Modularising CI workflow this way is kind of a style preference though, so feel free to decline/bikeshed that aspect of it.

Compiler flags

I changed the flags to optimise for size and strip all symbols, because that is generally the lowest hanging fruit for anyone concerned about binary size and might reflect the more "realistic" impact on such a project. OTOH, I don't think the difference in flags would actually turn a change in size into a non-change, so you might consider it spurious.

Transitive dependency breaks

One large-ish problem I found with historical analysis was that, specifically, I wanted to understand the jump in size between nightly-2023-07-01 and nightly-2023-07-03. This was a change in backtrace:

The problem is, running the "build with patched std" procedure on the old commit resulted in errors:

(Expand for the full output.) ```none error[E0308]: mismatched types --> library/std/src/../../backtrace/src/symbolize/gimli.rs:377:24 | 377 | if let Ok(mut frames) = object_cx.dwarf.find_frames(object_addr) { | ^^^^^^^^^^^^^^ ---------------------------------------- this expression has type `LookupResult>, addr2line::gimli::Error>, Buf = EndianSlice<'_, addr2line::gimli::LittleEndian>>>` | | | expected `LookupResult>`, found `Result<_, _>` | ::: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/addr2line-0.20.0/src/lib.rs:450:23 | 450 | ...lt, Error>, Buf = R>> | -------------------------------------------------------------------------- the expected opaque type | = note: expected enum `LookupResult>` the full type name has been written to '/rust/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/std-561bd21e49def8cf.long-type-9162012492364665641.txt' found enum `core::result::Result<_, _>` ``` ```none root@3770f30d41d9:/rust# python3 x.py build library --stage 0 Building bootstrap Finished dev [unoptimized] target(s) in 0.04s warning: x.py has made several changes recently you may want to look at help: consider looking at the changes in `src/bootstrap/CHANGELOG.md` note: to silence this warning, add `changelog-seen = 2` at the top of `config.toml` Updating submodule src/tools/cargo Submodule 'src/tools/cargo' (https://github.com/rust-lang/cargo.git) registered for path 'src/tools/cargo' Cloning into '/rust/src/tools/cargo'... remote: Enumerating objects: 1896, done. remote: Counting objects: 100% (1896/1896), done. remote: Compressing objects: 100% (1358/1358), done. remote: Total 1896 (delta 468), reused 1094 (delta 343), pack-reused 0 Receiving objects: 100% (1896/1896), 2.73 MiB | 1.73 MiB/s, done. Resolving deltas: 100% (468/468), done. remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0 remote: Enumerating objects: 32, done. remote: Counting objects: 100% (32/32), done. remote: Compressing objects: 100% (14/14), done. remote: Total 17 (delta 15), reused 5 (delta 3), pack-reused 0 Unpacking objects: 100% (17/17), 6.95 KiB | 790.00 KiB/s, done. From https://github.com/rust-lang/cargo * branch 5b377cece0e0dd0af686cf53ce4637d5d85c2a10 -> FETCH_HEAD Submodule path 'src/tools/cargo': checked out '5b377cece0e0dd0af686cf53ce4637d5d85c2a10' Updating submodule library/backtrace Updating submodule library/stdarch Submodule 'library/stdarch' (https://github.com/rust-lang/stdarch.git) registered for path 'library/stdarch' Cloning into '/rust/library/stdarch'... remote: Enumerating objects: 302, done. remote: Counting objects: 100% (302/302), done. remote: Compressing objects: 100% (247/247), done. remote: Total 302 (delta 45), reused 155 (delta 22), pack-reused 0 Receiving objects: 100% (302/302), 1.16 MiB | 2.60 MiB/s, done. Resolving deltas: 100% (45/45), done. remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0 remote: Enumerating objects: 87, done. remote: Counting objects: 100% (87/87), done. remote: Compressing objects: 100% (46/46), done. remote: Total 50 (delta 33), reused 13 (delta 3), pack-reused 0 Unpacking objects: 100% (50/50), 104.11 KiB | 1.12 MiB/s, done. From https://github.com/rust-lang/stdarch * branch d77878b7299dd7e286799a6e8447048b65d2a861 -> FETCH_HEAD Submodule path 'library/stdarch': checked out 'd77878b7299dd7e286799a6e8447048b65d2a861' Updating submodule library/backtrace downloading https://static.rust-lang.org/dist/2023-05-30/rustfmt-nightly-x86_64-unknown-linux-gnu.tar.xz ######################################################################################### 100.0% extracting /rust/build/cache/2023-05-30/rustfmt-nightly-x86_64-unknown-linux-gnu.tar.xz to /rust/build/x86_64-unknown-linux-gnu/rustfmt downloading https://static.rust-lang.org/dist/2023-05-30/rustc-nightly-x86_64-unknown-linux-gnu.tar.xz ######################################################################################### 100.0% extracting /rust/build/cache/2023-05-30/rustc-nightly-x86_64-unknown-linux-gnu.tar.xz to /rust/build/x86_64-unknown-linux-gnu/rustfmt downloading https://ci-artifacts.rust-lang.org/rustc-builds/4b6749b21e680a6280cf05ace533ae20c93d9bff/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz ######################################################################################### 100.0% extracting /rust/build/cache/llvm-4b6749b21e680a6280cf05ace533ae20c93d9bff-false/rust-dev-nightly-x86_64-unknown-linux-gnu.tar.xz to /rust/build/x86_64-unknown-linux-gnu/ci-llvm Building stage0 library artifacts (x86_64-unknown-linux-gnu) Updating crates.io index Downloaded addr2line v0.20.0 Downloaded miniz_oxide v0.7.1 Downloaded unicode-width v0.1.10 Downloaded adler v1.0.2 Downloaded getopts v0.2.21 Downloaded rustc-demangle v0.1.23 Downloaded allocator-api2 v0.2.15 Downloaded cc v1.0.79 Downloaded hashbrown v0.14.0 Downloaded compiler_builtins v0.1.93 Downloaded libc v0.2.147 Downloaded gimli v0.27.3 Downloaded 12 crates (1.6 MB) in 1.34s Compiling compiler_builtins v0.1.93 Compiling core v0.0.0 (/rust/library/core) Compiling libc v0.2.147 Compiling cc v1.0.79 Compiling memchr v2.5.0 Compiling std v0.0.0 (/rust/library/std) Compiling unwind v0.0.0 (/rust/library/unwind) Compiling rustc-std-workspace-core v1.99.0 (/rust/library/rustc-std-workspace-core) Compiling alloc v0.0.0 (/rust/library/alloc) Compiling cfg-if v1.0.0 Compiling adler v1.0.2 Compiling rustc-demangle v0.1.23 Compiling rustc-std-workspace-alloc v1.99.0 (/rust/library/rustc-std-workspace-alloc) Compiling panic_abort v0.0.0 (/rust/library/panic_abort) Compiling panic_unwind v0.0.0 (/rust/library/panic_unwind) Compiling gimli v0.27.3 Compiling hashbrown v0.14.0 Compiling miniz_oxide v0.7.1 Compiling object v0.31.1 Compiling std_detect v0.1.5 (/rust/library/stdarch/crates/std_detect) Compiling addr2line v0.20.0 error[E0308]: mismatched types --> library/std/src/../../backtrace/src/symbolize/gimli.rs:361:16 | 361 | if let Ok(mut frames) = cx.dwarf.find_frames(addr as u64) { | ^^^^^^^^^^^^^^ --------------------------------- this expression has type `LookupResult>, addr2line::gimli::Error>, Buf = EndianSlice<'_, addr2line::gimli::LittleEndian>>>` | | | expected `LookupResult>`, found `Result<_, _>` | ::: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/addr2line-0.20.0/src/lib.rs:450:23 | 450 | ...lt, Error>, Buf = R>> | -------------------------------------------------------------------------- the expected opaque type | = note: expected enum `LookupResult>` the full type name has been written to '/rust/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/std-561bd21e49def8cf.long-type-12642544034931188750.txt' found enum `core::result::Result<_, _>` error[E0308]: mismatched types --> library/std/src/../../backtrace/src/symbolize/gimli.rs:377:24 | 377 | if let Ok(mut frames) = object_cx.dwarf.find_frames(object_addr) { | ^^^^^^^^^^^^^^ ---------------------------------------- this expression has type `LookupResult>, addr2line::gimli::Error>, Buf = EndianSlice<'_, addr2line::gimli::LittleEndian>>>` | | | expected `LookupResult>`, found `Result<_, _>` | ::: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/addr2line-0.20.0/src/lib.rs:450:23 | 450 | ...lt, Error>, Buf = R>> | -------------------------------------------------------------------------- the expected opaque type | = note: expected enum `LookupResult>` the full type name has been written to '/rust/build/x86_64-unknown-linux-gnu/stage0-std/x86_64-unknown-linux-gnu/release/deps/std-561bd21e49def8cf.long-type-9162012492364665641.txt' found enum `core::result::Result<_, _>` For more information about this error, try `rustc --explain E0308`. error: could not compile `std` (lib) due to 2 previous errors Build completed unsuccessfully in 0:03:03 ```

I'm not familiar enough with backtrace and std to interpret this. If it's because there's a change in interface between the two versions of backtrace, leading to incompatibilities where you can't just splice the new or old version into the same version of std — I don't know of a way around it with this approach except to do it manually with known, specific versions of std.

OTOH, if it's because of some transitive dependencies that are now mismatched between those calculated in some lockfile and those resolved during this later step, that might be something that could be dealt with in this process.

The reason I mention it is that this class of code size increase (due to changes in dependencies or ones use of them) can be amongst the biggest and most well-hidden (IME, anyway, because of the "but I didn't change any code!" factor). And so it might be the most worthy of being caught by CI, but I can't quite tell (a) if this approach misses it and, if so, (b) how to change it so it doesn't miss it.

Sudden failure of copying files

Update: fixed now.

I was going to point you to an example PR I had in my own fork so you could verify that this worked. BUT in, literally, the 20 minutes it took me to write this PR, the workflow broke. Including ones that worked an hour ago. The failure is:

cp: 'build/x86_64-unknown-linux-gnu/stage0/lib/libLLVM-16-rust-1.71.0-beta.so' and 'build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/libLLVM-16-rust-1.71.0-beta.so' are the same file
cp: 'build/x86_64-unknown-linux-gnu/stage0/lib/librustc_driver-e73eb5c47f849c7b.so' and 'build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/librustc_driver-e73eb5c47f849c7b.so' are the same file
cp: 'build/x86_64-unknown-linux-gnu/stage0/lib/libstd-9ef2a438251e3a63.so' and 'build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/libstd-9ef2a438251e3a63.so' are the same file
cp: 'build/x86_64-unknown-linux-gnu/stage0/lib/libtest-c03aaf58feba71c6.so' and 'build/x86_64-unknown-linux-gnu/stage0-sysroot/lib/libtest-c03aaf58feba71c6.so' are the same file

I haven't debugged this yet, but it might be that something in the bootstrap process is (now) hardlinking outputs, and cp doesn't like that.

philipc commented 1 year ago

OTOH, if it's because of some transitive dependencies that are now mismatched

Yes it's because of this. You need to patch these too: https://github.com/rust-lang/rust/blob/70d7283d24ed2d664fc4cefad6f240aeb62e98f2/library/std/Cargo.toml#L27-L35

and this should be done if you want to get accurate measurements of size changes, because these dependencies can cause size changes too.

Kobzol commented 1 year ago

The cp commands should now be removed, they are performed by rustc bootstrap automatically.

Kobzol commented 1 year ago

This is pretty great! One problem with this workflow is that of GH permissions regarding sending comments. We probably shouldn't use pull_request_target at all, since in theory it enables GHA cache poisoning. But without it, we can't easily send PR comments. The pull_request_target approach and building code from "foreign" PRs is not deemed to be safe though, so we should probably do something about it.

One way to solve this would be to have one workflow that calculates the size, and then another workflow that just reads the calculated outputs and posts it to the PR. I started drafting something in https://github.com/rust-lang/backtrace-rs/pull/548, but it's not finished. I realized that it's not even that trivial to recognize if the original workflow came from a PR (and from which one). Maybe this should better be implemented in JavaScript, in a first-class GitHub action. Or it should just be a part of bors, I don't know.

detly commented 1 year ago

The cp commands should now be removed, they are performed by rustc bootstrap automatically.

I'll rebase this after #550, I just can't do it in parallel because I can only test workflows as they exist on my fork's master branch and it's already a bit of a juggle.

Re. the safety aspect - I am more familiar with Gitlab's CI than Github's, but they have the concept of "metrics" that have a similar kind of "first class" recognition to things like test and coverage reports. I say similar because they don't actually have a good way to display them in MRs, they're more for post-deployment monitoring (I think), but it's in the works. My point is that if Github have some other first class way to report basic numeric metrics that don't involve posting a comment to the PR via the API, that could be an easy win. I'm trying to find that out.

For my own understanding:

The pull_request_target approach and building code from "foreign" PRs is not deemed to be safe though

What's the potential issue? I thought that any PR triggered the workflow as written in the target.

Kobzol commented 1 year ago

What's the potential issue? I thought that any PR triggered the workflow as written in the target.

There are (at least) two issues currently: 1) The part of the workflow that checks out the untrusted part currently has pull-requests: write permission, meaning that it could do whatever it wants with PRs, if the built code was somehow executed. This should be trivial to fix by splitting the current workflow into two jobs (one would measure the binary size, the other would post the comment) and giving no permissions to the first job, and PR permissions to the second job (you cannot change permissions per steps, only per jobs). 2) The pull_request_target workflow is prone to GHA cache poisoning attacks. I haven't found any way to prevent this, and GH considers this not to be an issue apparently, so it's probably marked as wontfix. This is mostly a theoretical attack vector, because (barring some really obscure bug in rustc), backtrace shouldn't have any way of executing code during its build, and this repository does not use GHA cache in any way currently (AFAIK). But I suppose that all attack vectors first start as theoretical...

Kobzol commented 1 year ago

FWIW, fixing the first issue to make sure that no permissions are available to the job that could actually in theory run untrusted code, would be a good enough fix for now.

detly commented 1 year ago

FWIW, fixing the first issue to make sure that no permissions are available to the job that could actually in theory run untrusted code, would be a good enough fix for now.

I could do that, the sizes are just numbers so they can be conveyed via env vars without even needing artifacts passed around.

Kobzol commented 1 year ago

I could do that, the sizes are just numbers so they can be conveyed via env vars without even needing artifacts passed around.

I think that ENV vars are not persisted between jobs. It could be done via job outputs though, that's quite similar.

detly commented 1 year ago

It could be done via job outputs though, that's quite similar.

Yep, this was easy enough, 8d65e69 has it. Here's an example of it posting a comment.

detly commented 1 year ago

I also have a solution to the problem of (some) dependency bumps resulting in mismatched symbols. Would you like me to include it in this PR, or would you prefer to limit the scope here and have it in a different PR?

Kobzol commented 1 year ago

I would keep it for a separate PR. Good job with the job split, now it should be safer.

detly commented 1 year ago

One other very minor thing, but I am finding numbers like 391568B kind of hard to read. Would anyone like me to format the number with, say, Intl.NumberFormat({useGrouping: "always"}).format() to add a thousands separator (comma), and add a non-breaking space between the number and unit?

workingjubilee commented 1 year ago

One other very minor thing, but I am finding numbers like 391568B kind of hard to read. Would anyone like me to format the number with, say, Intl.NumberFormat({useGrouping: "always"}).format() to add a thousands separator (comma), and add a non-breaking space between the number and unit?

I'd like something that splits out KiB and B, personally.

detly commented 1 year ago

I'd like something that splits out KiB and B, personally.

I have been playing around with Intl.NumberFormat() which mostly does a decent job, but unfortunately abbreviates the unit of "byte" to... "byte". This is dictated by the source spec they defer to. Here's an example.

For now I've used Intl.NumberFormat() for the numbers and just put B in the strings (example).

Edit: Currently it looks like

Original binary size: 391,568 B Updated binary size: 371,088 B Difference: -20,480 B (-5.23%)

detly commented 1 year ago

(I'm happy to rebase and squash up commits when you're ready.)

detly commented 1 year ago

This has languished a bit, and I left it alone because I was busy, but if it's still of interest let me know. I should have time now to address anything that's still outstanding.

workingjubilee commented 1 year ago

Sorry!

https://www.youtube.com/watch?v=fksu6FENojY

This looks good, I can approve it as soon as you're happy with it re: commit squashing or w/e.

detly commented 1 year ago

https://www.youtube.com/watch?v=fksu6FENojY

Oh I hear that. No pressure. I won't be doing anything faster than can be measured in days either. I will tidy up the history and ping you here when that's done. If others have feedback on the details too I'm happy to hear it in the meantime.

detly commented 11 months ago

Okay I'm much happier with this. If you want to further squash the commits, I don't mind, but the history as it is now should be clear and valid taken commit by commit too.

detly commented 11 months ago

Docker was already mad, right?

workingjubilee commented 11 months ago

That failure is new, I think.