serde-rs / serde

Serialization framework for Rust
https://serde.rs/
Apache License 2.0
9k stars 760 forks source link

using serde_derive without precompiled binary #2538

Closed decathorpe closed 1 year ago

decathorpe commented 1 year ago

I'm working on packaging serde for Fedora Linux, and I noticed that recent versions of serde_derive ship a precompiled binary now. This is problematic for us, since we cannot, under no circumstances (with only very few exceptions, for firmware or the like), redistribute precompiled binaries.

Right now the fallback I am trying to apply for the short-term is to patch serde_derive/lib.rs to unconditionally include lib_from_source.rs (we don't really care about compilation speed for our non-interactive builds).

I'm wondering, how is the x86_64-unknown-linux-gnu binary actually produced? The workspace layout in this project looks very differently from when I last looked in here ... Would it be possible for us to re-create the binary ourselves so we can actually ship it? Or would it be possible to adapt the serde_derive crate to fall back to the non-precompiled code path if the binary file is missing?

dtolnay commented 1 year ago

the fallback is to patch serde_derive/lib.rs to unconditionally include lib_from_source.rs

This is fine. If it's any simpler, you can even rename lib_from_source.rs to lib.rs.

how is the x86_64-unknown-linux-gnu binary actually produced? Would it be possible for us to re-create the binary ourselves so we can actually ship it?

By https://github.com/serde-rs/serde/blob/v1.0.177/precompiled/build.sh. Yes.

would it be possible to adapt the serde_derive crate to fall back to the non-precompiled code path if the binary file is missing?

No, not when serde_derive is being built by Cargo. The dependency graph (whether or not serde_derive depends on syn, quote, proc-macro2) must be resolved before anything compiles.

cuviper commented 1 year ago

It could be a Cargo feature to force a rebuild, enabling the dependencies as well.

decathorpe commented 1 year ago

Thanks for confirming - I'll apply a downstream patch for now.

I'm looking forward to the compile time savings that would be achievable by changes like this in the future 💪🏼

quentinmit commented 1 year ago

Trying to get this to work with cargo2nix, which has a similar problem, and I found that renaming/patching lib_from_source.rs is insufficient because the Cargo.toml also excludes the dependencies on linux. It's also particularly hard to patch in extra dependencies, so for the moment I'm stuck with pinning the previous version.

Please provide a feature that can be used to disable precompiled binaries so it can be turned off in a cargo-compatible way.

decathorpe commented 1 year ago

@quentinmit these are my current patches for Fedora, maybe this could work to provide a similar workaround for cargo2nix?

pmatouse commented 1 year ago

@dtolnay please consider moving the precompiled serde_derive version to a different crate and default serde_derive to building from source so that users that want the benefit of precompiled binary can opt-in to use it. or vice-versa. or any other solution that allows building from source without having to patch serde_derive.

having a binary shipped as part of the crate, while I understand the build time speed benefits, is for security reasons not a viable solution for some library users.

thank you for your consideration.

nightkr commented 1 year ago

From Nix's perspective the issue is that the dependency's manifest (project) folder is not available when building dependees, only its build output.

In theory this could be worked around by include_bytes!()-ing it instead, but that would require it to be extracted before it could be executed. Personally I'd rather just see a feature to ignore the precompiled binary and revert to the old behaviour.

ericmcbride commented 1 year ago

This broke our Bazel build system as well, pinning to 1.0.171 resolved it

DarkKirb commented 1 year ago

Would it be possible to add a feature which disables the binary? like from-source?

jirutka commented 1 year ago

Would it be possible to add a feature which disables the binary? like from-source?

This wouldn’t be enough, because you can’t simply enable a feature for a transitive dependency. This shouldn’t be done per dependency usage, but globally – either using custom --cfg flag (which can be passed via the RUSTFLAGS environment variable) or an environment variable.

Sorry if this sounds too harsh, but I have to say it: downloading an arbitrary binary code from the internet and running it on the user’s system during the build, moreover without their knowledge and consent, is a despicable approach that is completely against security! It also has a lot of practical problems related to portability to other architectures, systems, build systems, etc. as you have hopefully already learned from other issues related to this.

If you really want to use this approach, at least make it opt-in, not opt-out, via --cfg or an environment variable. Please don't make life harder for package maintainers; we are really tired of dealing with these antipatterns in downstream projects over and over and over again.

nightkr commented 1 year ago

This wouldn’t be enough, because you can’t simply enable a feature for a transitive dependency.

You can, actually. Cargo will collect all enabled features across the whole dependency graph before building the library. For a workspace like this:

# a/Cargo.toml
[package]
name = "a"
[features]
b = []
c = []

# b/Cargo.toml
[package]
name = "b"
[dependencies]
a = { path = "../a", features = ["b"] }

# c/Cargo.toml
[package]
name = "c"
[dependencies]
a = { path = "../a", features = ["c"] }

# d/Cargo.toml
[package]
name = "d"
[dependencies]
b.path = "../b"
c.path = "../c"

Building d would cause a to be built once with the feature set ["b", "c"].

The one exception is that incompatible versions of a would be separated (so you could have a-1.0.0 with ["b"] and a-2.0.0 with ["c"]. But that already kind of breaks for Serde anyway since the whole point is to have a single set of traits that everything interoperates with.

jirutka commented 1 year ago

You can, actually. Cargo will collect all enabled features across the whole dependency graph before building the library

I know how it works, but I was too vague in my comment; I should have written: ...without modifying the project’s Cargo.toml. This could be more complicated in a complex project with many crates and multiple targets, some of which are used as tools to build or test others. Also, if you add a new dependency to Cargo.toml, you also have to modify Cargo.lock – and we always want to build with Cargo.lock, which is distributed with the source, to ensure reproducibility. Do you know how to do this with a single command that does not change any dependency version or add new dependencies to the Cargo.lock (which will happen if the feature flag also enables some optional dependencies)? We can use the good old patch file, but updating it with every package bump is quite annoying. If a global --cfg flag or environment variable is used instead, it’s just a matter of setting an environment variable in the package build script.

Last but not least, if the build fails with a cryptic message like “File not found”, you need to know that the program you are packaging uses a crate, perhaps deep in the dependency graph, that does some nasty thing like downloading and executing some binary from the internet in the background, which may not work on the host system (maybe because you don't provide precompiled binary for architecture XY). And that you have to jump through hoops of patching the project’s files to get rid of it.

That’s why I said it should be opt-in, not opt-out, and in a way that is under the control of the person building the whole project that uses serde somewhere in the dependency graph, not under the control of a developer of some of the project’s dependencies.

And once again, this “feature” shouldn’t be here in the first place, and should be considered as a security issue (as @mulkieran requested in https://github.com/rustsec/advisory-db/issues/1737).

nightkr commented 1 year ago

Yeah that's fair, I guess I was approaching it more from the perspective of an app developer doing their own packaging. I don't think --cfg flags (or env vars) are practical here though, since serde_derive's dependency set depends on whether it is precompiled.

Agreed on everything else, including this being an anti-feature in general.

ThetaSinner commented 1 year ago

Same issue here with using this in a Nix environment, the binary is just not found. Is there any chance of this change being backed out while this gets figured out?

pacak commented 1 year ago

Is there any chance of this change being backed out while this gets figured out?

You can pin serde to 1.0.171 - last version that works normally.

ThetaSinner commented 1 year ago

Yes I can do that in this project and I will do for now but people are building applications against it and we'd have to ask all of them to pin too because like us they are using serde = "1.0" and expecting updates to be safe to take as they publish

pacak commented 1 year ago

Unless I'm mistaken you using it as serde = "=1.0.171" makes it so for users of your crate and serde (with serde = "1.0") cargo would resolve serde to the pinned version.

Mingun commented 1 year ago

Better to pin to serde = "<=1.0.171" of course

dtolnay commented 1 year ago

Thanks for the comments everyone. I'll go ahead and close this. The precompiled implementation is the only supported way to use the macros that are published in serde_derive. If there is implementation work needed in some build tools to accommodate it, someone should feel free to do that work (as I have done for Buck and Bazel, which are tools I use and contribute significantly to) or publish your own fork of the source code under a different name. Separately, regarding the commentary above about security, the best path forward would be for one of the people who cares about this to invest in a Cargo or crates.io RFC around first-class precompiled macros so that there is an approach that would suit your preferences; serde_derive would adopt that when available.

nightkr commented 1 year ago

I'd put it the other way around, until the day that Cargo adds first-class support for opting into precompiled macros, the solution is to not publish precompiled macros.

Did anyone ask for this "feature"? Do you expect to see any non-neglegible real-world improvement in build performance, given that serde-derive is quite unlikely to be the only proc macro someone is using (and all of serde-derive's dependencies are almost universal)? Do you expect, longer term, that all proc macros will use this kind of scheme?

But it sounds like forking is indeed the way to go then.

DarkKirb commented 1 year ago

does that mean that platforms other than x86_64-linux-gnu are unsupported? because, say, aarch64-apple-darwin can’t run the precompiled binary

pacak commented 1 year ago

because, say, aarch64-apple-darwin can’t run the precompiled binary

Check the source code. Script specifically checks if it runs on x86_64-linux-gnu and if so - it uses the precompiled binary, otherwise it uses syn way of compiling as usual. "The only supported way ..." is a polite way of saying "Works for me, WONTFIX".

DarkKirb commented 1 year ago

yes, i do realize it’s doing that, but dtolnay explicitely said only the precompiled binary is officially supported, ergo platforms that can’t run x86_64-linux-gnu binaries aren’t officially supported?

Xe commented 1 year ago

This breaks my use of serde on NixOS.

nightkr commented 1 year ago

Looks like using a git dependency is a usable workaround for now, but it generates a "duplicate package" warning, and obviously needs to be tweaked manually for each serde update:

# in workspace Cargo.toml
[patch.crates-io]
serde_derive = { git = "https://github.com/serde-rs/serde.git", tag = "v1.0.183" }

Republishing a fork to crates.io isn't workable since [patch] clauses require both the source and destination to have the same name, and we need to run this replacement transitively to be of much use.

decathorpe commented 1 year ago

I think you should be able to use this as a global setting by adding the [patch.crates-io] setting to .cargo/config.toml instead of Cargo.toml.

mulkieran commented 1 year ago

What does using the git dependency work around?

nightkr commented 1 year ago

The git dependency bypasses the precompiled binary, rebuilding from source instead.

This works because the repo contains two different packages named serde_derive: serde_derive (built from source), and precompiled/serde_derive (a thin wrapper around a prebuilt binary). precompiled/serde_derive is the one that is published to crates.io, while serde_derive is preferred when using the git dependency.

alice-i-cecile commented 1 year ago

This is likely going to be a show-stopper for using serde in game code on consoles, and possibly mobile. I would really appreciate you reconsidering this.

epage commented 1 year ago

I would ask that people to please not control the upper bound of your dependencies but to instead rely on a lockfile. See https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#multiple-requirements for more details.

pacak commented 1 year ago

I would ask that people to please not control the upper bound of your dependencies but to instead rely on a lockfile.

I can add a comment to Cargo.toml and have commit message to refer to this ticket, same won't work with Cargo.lock since it is generated and updated automagically.

The whole point of pinning or otherwise restricting version with Cargo.toml is to avoid known incompatible (with crate2nix) versions, allowing cargo to pick something else defeats the purpose.

other packages in the dependency tree may require a newer version, leading to an unresolvable error

After updating some dependencies I would rather have an error right away in CI rather than later when trying to use new code in prod.

epage commented 1 year ago

After updating some dependencies I would rather have an error right away in CI rather than later when trying to use new code in prod.

I think there is some context to your comment missing because this does not follow to me.

  1. The policy for pinning should belong to each package. Your reason for pinning might not carry forward to your dependents
  2. For the final artifact, it should break in their CI if its going to break in prod
  3. Its not just about upgrading. Say I take two libraries and try to use them in y application. If one restricts the upper bound of serde_derive because the author doesn't like this policy but the other needs a newer serde_derive, I won't be able to use them together but will get an error. A confusing error at that.
pacak commented 1 year ago

Well, I'm talking about my cargo plugins specifically so there are no dependents. And if you look though projects having problems with this change - most if not all of them are applications as well that use strange build systems. There's not much point in compiling just a rust library with nix.

In my case I publish them on crates.io because I think they might be useful to other people. If crate compiles with cargo and is pure Rust - it usually compiles well in nix (unless it is doing some shenanigans) so I don't have nix in CI. Pinning and getting earlier indication about compatibility issues is better outcome for me than updating, publishing and not being able to use it in prod.

No dependents so no conflicts.

because the author doesn't like this policy

It's not that I don't like this policy - it simply won't compile for me.

This breaks my use of serde on NixOS.

epage commented 1 year ago

From those cargo guidelines

In some instances this won’t matter or the benefits might outweigh the cost, including:

  • When no one else depends on your package e.g. it only has a [[bin]]
asonix commented 1 year ago

This breaks building my applications with nix. I think I can fix this for myself by having two Cargo.toml and Cargo.lock files for my applications. One that I can use to publish to crates.io from a non-nix environment, and one that I can have a git-dependency with for bypassing the precompiled binary

After double-checking, it seems like although this affects crate2nix it doesn't affect a "normal" nix derivation for rust packages. Even still, the idea that the act of compiling software downloads and runs blobs feels wrong to me. It makes the produced software inherently harder to trust.

Please reconsider this.

kpcyrd commented 1 year ago

how is the x86_64-unknown-linux-gnu binary actually produced? Would it be possible for us to re-create the binary ourselves so we can actually ship it?

By https://github.com/serde-rs/serde/blob/v1.0.177/precompiled/build.sh. Yes.

The precompiled binary on crates.io does not match the binary this script builds from source.


I had some trouble getting it to work at all, but this is what I ended up with:

Downloading and extracting the pre-compiled binary:

$ wget -O serde_derive-1.0.177.tar.gz https://crates.io/api/v1/crates/serde_derive/1.0.177/download
$ sha256sum serde_derive-1.0.177.tar.gz
401797fe7833d72109fedec6bfcbe67c0eed9b99772f26eb8afd261f0abc6fd3  serde_derive-1.0.177.tar.gz
$ tar xf serde_derive-1.0.177.tar.gz
$ sha256sum serde_derive-1.0.177/serde_derive-x86_64-unknown-linux-gnu
535c81e5e541d3961331a3f5d6a41f87ed5341fbfdc09c4ab3b2140aa6674e93  serde_derive-1.0.177/serde_derive-x86_64-unknown-linux-gnu

Start a container (I'm really just guessing here, I tried rust:latest first but couldn't resolve the linking issues):

$ podman run -it --rm rust:alpine

Run this inside the container:

$ apk add git bash
$ rustup install nightly
$ rustup component add rust-src --toolchain nightly-x86_64-unknown-linux-musl
$ git clone https://github.com/serde-rs/serde
$ cd serde
$ git checkout v1.0.177
$ ./precompiled/build.sh
$ sha256sum ./precompiled/serde_derive/serde_derive-x86_64-unknown-linux-gnu
63dd6c4ff05506f70ba558bac0bf4d69fe115350c2d40e4c756c85c336cf59fb  ./precompiled/serde_derive/serde_derive-x86_64-unknown-linux-gnu

Compare the binary (the container id is going to be different for you):

$ mkdir built/
$ podman cp 3d5fbf678747:/serde/precompiled/serde_derive/serde_derive-x86_64-unknown-linux-gnu built/
$ diffoscope serde_derive-1.0.177/serde_derive-x86_64-unknown-linux-gnu built/serde_derive-x86_64-unknown-linux-gnu

Inside of this (quite large) diff I found this difference:

├── readelf --wide --decompress --string-dump=.comment {}
│ @@ -1,5 +1,5 @@
│
│  String dump of section '.comment':
│    [     0]  GCC: (GNU) 9.4.0
│ -  [    11]  rustc version 1.73.0-nightly (0d95f9132 2023-07-26)
│ +  [    11]  rustc version 1.73.0-nightly (076887268 2023-08-17)

Trying again, manually attempting to match the toolchain version (you need to +1 the date or you get 1.73.0-nightly (864bdf784 2023-07-25) instead):

$ apk add git bash
$ rustup install nightly-2023-07-27
$ rustup component add rust-src --toolchain nightly-2023-07-27-x86_64-unknown-linux-musl
$ git clone https://github.com/serde-rs/serde
$ cd serde
$ git checkout v1.0.177
$ sed -i 's/cargo .nightly/cargo +nightly-2023-07-27/' ./precompiled/build.sh
$ ./precompiled/build.sh
$ sha256sum ./precompiled/serde_derive/serde_derive-x86_64-unknown-linux-gnu
e995fa01511e3d6e7dc90d1fca3753be0ba07d0cb9ba495fdee92f64722bca22  ./precompiled/serde_derive/serde_derive-x86_64-unknown-linux-gnu

Still no luck, trying diffoscope again:

$ diffoscope --html diffoscope-serde_derive-1.0.177.html serde_derive-1.0.177/serde_derive-x86_64-unknown-linux-gnu built/serde_derive-x86_64-unknown-linux-gnu

I've uploaded the results here (the diff along with the official and my own binary):

https://pkgbuild.com/~kpcyrd/2023-08-18-serde-derive-forensics/diffoscope-serde_derive-1.0.177.html

image

conradludgate commented 1 year ago

I would ask that people to please not control the upper bound of your dependencies but to instead rely on a lockfile. See https://doc.rust-lang.org/nightly/cargo/reference/specifying-dependencies.html#multiple-requirements for more details.

This is a double-edged sword. My biggest application distributed on crates.io makes use of cryptographic libraries. cargo install _ --locked has on occasion featured yanked crypto crates due to vulnerabilities. I can't in good faith recommend users use locked to install.

kpcyrd commented 1 year ago

vulnerabilities in your crypto dependencies

You're supposed to make a new upstream release when that happens.

Without a new release, your users would not know their "latest version" is not safe to use anymore, and Linux distributions would keep shipping binaries compiled with vulnerable crypto libraries to your users.

marshrayms commented 1 year ago

Just speaking only for myself:

Unless a straightforward and effective method is found which prevents serde from downloading and running unaudited binaries from the internet at build time, I will be removing serde (and anything that depends on it, transitively) from my project.

I find this new policy mystifying, after all the well-publicized problems with credential-stealing malware in the NPM, PyPi, and Ruby Gems repositories. Imagine discovering the same problem in crates.io, but additionally being unable to reason with certainty about the source code that was actually used to build the binary thing that runs on the build system.

I understand the argument that proc macros work by compiling and running code on the build host, but it goes deeper than that. I may trust $crate_developer to not write malicious code, but not to secure my production build server. Particularly when $crate_developer has expressed that they see no issue with downloading unsigned binaries off the internet and executing them on their own production build server. Once such an ecosystem begins to diverge, how does it ever reconverge to a trustworthy state?

I understand that most developers are happy to accept the risks involved in silently downloading and running arbitrary unsigned binaries on their systems. But some organizations maintain strict controls on their supply chain and build environments. Some do, in fact, perform a series of manual and/or automated auditing steps on the source code before it is allowed into a private cargo registry.

Please reconsider this change in policy as it is incompatible with secure development workflows.

hfiguiere commented 1 year ago

As a flatpak reviewer for Flathub, where we require offline builds, I have concern that this might break builds. I haven't experimented yet, but anything needing network at build time will fail.

demoray commented 1 year ago

Note, this does not break offline builds. The package itself includes the binary, which is executed at runtime. For reference, this is viewable on docs.rs: https://docs.rs/crate/serde_derive/latest/source/

sdgathman commented 1 year ago

@marshrayms For rpm builds, can you just remove the binary in %prep ? If that breaks the build, then the package is FTBFS (Fails To Build From Source). The precompiled binary should just be a build time optimization for the risk tolerant.

Anyone remember the back door built in to the original K&R C compiler? It wasn't in the source, but propagated through the existing cc binaries - even when recompiling cc from source.

GoogleBot42 commented 1 year ago

The precompiled binary should just be a build time optimization for the risk tolerant.

If there is no option to not use the precompiled binary, then the precompiled binary is not an optimization. It is required. Patching the source code to add the option doesn't change that the upstream requires the binary.

Additionally, building the artifact isn't reproducible, see earlier in this thread. You couldn't easily verify it yourself even if you wanted to.

Anyone remember the back door built in to the original K&R C compiler? It wasn't in the source, but propagated through the existing cc binaries - even when recompiling cc from source.

I don't see how that's related at all. With the precompiled binary I have to trust at least the following.

  1. The source code
  2. The host and authenticity of the precompiled binary
  3. The environment that built it
  4. The user that built it

With the source, I need to trust

  1. The source code

With systems that ensure reproducible builds (such as Nix and Guix), you can trust cached builds because you can trivially build the same binary yourself. There's a lot of work and research that goes into this. https://guix.gnu.org/en/blog/2018/bootstrapping-rust/

Edit: this work is particularly amazing https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/

phyber commented 1 year ago

Given all the other drama in the Rust world in recent times, this is profoundly disappointing.

I'm very happy to learn that in the FreeBSD systems that I care about, I get to compile from source.

sdgathman commented 1 year ago

If there is no option to not use the precompiled binary, then the precompiled binary is not an optimization.

I understand that. I am in the same pickle packing cjdns on Fedora which uses serde in new releases. BUT, what about just using

%prep
rm -f path/to/possiblytrojanbinary
GoogleBot42 commented 1 year ago

BUT, what about just using

Well... you seem to think the binary is in the repository. it downloads the binary as a part of it's build step. It's not like you can just delete it before building.

Nope nevermind. Others noted it's located in a cargo package... Which is pretty gross.

sdgathman commented 1 year ago

@demoray claimed:

Note, this does not break offline builds. The package itself includes the binary, which is executed at runtime. For reference, this is viewable on docs.rs: https://docs.rs/crate/serde_derive/latest/source/

https://github.com/serde-rs/serde/issues/2538#issuecomment-1684252595

Ralith commented 1 year ago

it downloads the binary as a part of it's build step

I don't think that's true. cargo can download the package like any other (binary-bearing or otherwise). The precompiled serde package itself contains no network access that I can see.

jhpratt commented 1 year ago

As the maintainer of time, I will be changing the semver requirements to <= 1.0.171 for security reasons. No user should ever have a binary downloaded and executed without their knowledge. It is a significant breach of trust that this was even considered. All of this for 3 milliseconds per invocation? That's absurd.

Will this break builds for some people due to conflicting version requirements? Yes. However, I would rather someone's build fail than be them unknowingly be subjected to an arbitrary executable being executed. The binaries aren't even reproduceable. There is no question whatsoever this is a security issue.

Edit: This has been released as time 0.3.26.

chorman0773 commented 1 year ago

LC is going to follow suit with the above. Not only is this a potential security vulnerabilty in every crate that depends on serde, but this is also a potential compatibility issue with later compiler versions or with alternative (non-rustc) compilers at the very least.