rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.55k stars 2.38k forks source link

DEP_FOO_KEY-like system that can work without "links" #3544

Open dwrensha opened 7 years ago

dwrensha commented 7 years ago

Would it make sense to allow usage of DEP_<name>_<key environment variables in a build.rs for dependencies that do not link to native libraries?

Context: I would like to allow crates like this one to expose Cap'n Proto schema definitions, and in particular to allow Cap'n Proto's import mechanism to work across crate boundaries. (See https://github.com/dwrensha/capnpc-rust/issues/30.)

For example, say I write a my_app crate that depends on the sandstorm crate, and I want my_app to have some capnp schemas that import from the sandstorm schemas. For this to work, the my_app build.rs needs to know where to find the source code of the sandstorm schemas. One way to accomplish that would be for the sandstorm crate's build.rs to write its current working directory to a DEP_SANDSTORM_IMPORT environment variable, through the mechanism described here. However, that only works if the sandstorm crate has a "links" field in its Cargo.toml. Since the sandstorm crate does not actually need to link to a native library, adding such a field feels superfluous and possibly prone to mysterious bugs.

alexcrichton commented 7 years ago

Yeah I've often desired such a feature as well. I've had difficulty rationalizing in the past what <name> would be, however, to disambiguate from other crates. For example if two versions of the same library are linked into a crate how would those keys be passed to the build script? Cargo doesn't support this feature today, but it may wish to one day.

Other than that though it seems like a reasonable feature to me!

sfackler commented 7 years ago

You can't depend directly on two versions of the same crate though, right? Are DEP_* variables from transitive dependencies exposed to build scripts?

alexcrichton commented 7 years ago

@sfackler not currently, no, but I could imagine adding that as a feature to Cargo at some point. I think we'd only want to expose immediate dependencies to build scripts, not transitive, as that'd make the problem much easier (and more practical I think)

alexcrichton commented 7 years ago

Well so we may be able to actually just go ahead and do this.

So Cargo will pass a number of --extern flags with some name. That is the namespace for incoming crates. If Cargo ever supports multiple versions it'll have to support renaming crates anyway (e.g. renamed when passing to --extern. In that sense I think it's safe to say there's a namespace here anyway and it will fit naturally into the env vars.

We'd probably want to spec that the name in the env var though is the name of the library, not the name of the package. That differs in a few rare cases but would future-proof us a bit.

retep998 commented 7 years ago

You currently can depend on two crates that have the same library name, yet different package names, even straight from crates.io.

SimonSapin commented 7 years ago

I fully expected <name> to be the crate/package’s name, was confused for quite some time as to why by dependen’t build script did not get any DEP_* environment variable, and only finally figured it out by reading Cargo’s own test suite.

I’ve ended up adding to a crate a links key with a dummy value.

retep998 commented 7 years ago

As far as I can tell these DEP_* keys are the only way we have to provide information to downstream consumers (which is really important when the functionality that the crate exposes is controlled by the user rather than the highly restrictive cargo features). The fact that links has to be specified to be able to use this feature is a real hassle, especially for crates that don't want cargo to limit them to only a single version (otherwise this happens).

alexcrichton commented 7 years ago

I'll again refer to my previous comments:

@SimonSapin @retep998 if you'd like to send a PR for this it'd definitely be most welcome!

retep998 commented 7 years ago

If we do add DEP_* keys following the crate name, what do we do when there are collisions with the existing DEP_* keys using the links value? Would we effectively declare the old way deprecated and only provide DEP_* keys using links when they don't conflict with a crate name?

alexcrichton commented 7 years ago

We could just use a different namespace, such as CARGO_DEP_FOO_KEY=value

sfackler commented 7 years ago

I think it'd be a good idea in general to start migrating to a CARGO_ prefix on environment variables.

indygreg commented 4 years ago

I have a pretty similar case as the initial reporter for indygreg/PyOxidizer. I have a crate with a build script that needs to generate a library containing a special build of Python. The build process also generates some files which are consumed by callers into that library from a different crate. I'd like for the crate doing all the heavy lifting to expose the paths to these build artifacts to dependent crates so they can pick them up and use them.

I can kinda coerce the existing DEP_* variables to work with links, but it feels brittle since I'm not actually declaring a library dependency in the crate setting that key. That feels like the kind of thing that could break at any time since behavior sounds undefined.

cr1901 commented 1 year ago

I probably need this functionality for a crate that doesn't depend on any native library. So I'm likely to go ahead and set links to name-of-crate-$COOKIE, where $COOKIE was set with mcookie.

This is almost surely not going to conflict with any native library.

It's not great, but I'll put a comment back to this issue explaining the rationale.

gzz2000 commented 1 year ago

I need this, too. I think it can be helpful to just kind of expose CARGO_DEP_MANIFEST_DIR_{dep crate name} and CARGO_DEP_OUT_DIR_{dep crate name} envs to dependents.

abrown7100 commented 1 year ago

My team and I could also use the functionality described in this issue. I just wanted to chime in on it too and voice support. For now, we're utilizing one of a couple possible workarounds, but we're all trying to learn the idioms and best practices of the Rust community.

alex commented 11 months ago

I'm another person with a similar use case (see: https://boringssl-review.googlesource.com/c/boringssl/+/63305).

I'd be more than happy to do some of the leg work to make this happen, what would the next step be there? Does this require an RFC, or just a design + PR?

davidben commented 11 months ago

From the BoringSSL side, lifting the constraint would improve things, but I think a better design would be something that depends less on build.rs in the first place.

In C, it is common and straightforward for packages to communicate to their users by exposing #defines in their macros. The users can then write code like:

#if OPENSSL_VERSION_NUMBER < ...
code for old OpenSSL
#else
code for new OpenSSL
#endif

This way a package can be compatible with a wider range of OpenSSL versions, which in turn makes consuming that package easier for its users.

Rust, from what I can tell, has no good way to do this. Now, Rust is much better at multiple-linking than C, so there's a lot more breathing room when this is missing, but it's still better to avoid multiple-linking when you can. (Binary size, types that leak through the public API, general supply chain security issues around how many packages worth of security bugs you're exposed to.) And so it is useful to be able to adapt your code to what version of the package is ultimately resolved.

Lifting this special-case on -sys crates would make this possible, but very tedious. Every package that does this would suddenly need to write custom imperative code in a build.rs file to translate from environment variables to cfgs. This is particularly tricky for projects with more complex build setups for hermeticity. If cfgs are the Rust story for conditional code, I would suggest a declarative, in-source mechanism would be better here.

For example, perhaps a way for a Rust crate to export cfgs to another crate to be imported? That would also solve the naming problem, if it participates in usual name import.

epage commented 11 months ago

@davidben there is https://github.com/rust-lang/rust/issues/64797 though I think I heard they are wanting to start it with just std. That is considered a blocker before considering adding rustc version cfgs. I think we would likely want to go down the same route. I'm assuming doing so would resolve that use case. If so, I'd recommend creating an issue for that rather than hoping for this unrelated feature being extended to support it.

Arnavion commented 11 months ago

I use the same links hack in k8s-openapi since 2020 to let downstream libraries know which version feature of k8s-openapi was selected by the downstream-most binary crate, so that the libraries can do conditional compilation.

k8s-openapi supports multiple features like v1_22, v1_23, ... v1_28, corresponding to Kubernetes API server version. A binary crate makes the choice of which one of those it wants to enable. Any libraries between k8s-openapi and the binary crate that use k8s-openapi may need to compile differently based on which version feature was selected.

So k8s-openapi's build.rs emits cargo:version=<value that encodes the enabled feature> and downstream libraries can look at DEP_ENV_K8S_OPENAPI_*_VERSION env var to get that value and create their own cfgs, or whatever they want to do. https://arnavion.github.io/k8s-openapi/v0.20.x/k8s_openapi/#conditional-compilation

The alternative is that each such library exports its own version features that map one-to-one with the corresponding k8s-openapi features. This is busywork, especially when the chain of such crates is multiple crates long, or when a crate wants to match on a range of versions (eg all versions below 1.25). The latter would require reimplementing < using cfg(any(feature = "v1_21", feature = "v1_22", ...)) which is also busywork and brittle. Also, if even a single crate doesn't do the re-exported features, or doesn't do it correctly, it ruins the whole thing.

cfg(accessible()) or cfg(version()) would not solve this problem in general. The specific example in my link (whether the PodSpec::host_users field exists or not) might be solved by cfg(accessible()) , but there are other differences between the versions that cannot, like a field changing from type Foo to type Option<Foo>, or vice versa, or to a completely different type Bar, or to a type implementing Default in some versions but not others.

epage commented 11 months ago

I use the same links hack in k8s-openapi since 2020 to let downstream libraries know which version feature of k8s-openapi was selected by the downstream-most binary crate, so that the libraries can do conditional compilation.

I wonder if https://internals.rust-lang.org/t/pre-rfc-mutually-excusive-global-features/19618 could help here.

The specific example in my link (whether the PodSpec::host_users field exists or not) might be solved by cfg(accessible()) , but there are other differences between the versions that cannot, like a field changing from type Foo to type Option, or vice versa, or to a completely different type Bar, or to a type implementing Default in some versions but not others.

Those sound like breaking changes which would imply controlling them with either system isn't a good fit.

Arnavion commented 11 months ago

I wonder if https://internals.rust-lang.org/t/pre-rfc-mutually-excusive-global-features/19618 could help here.

As it is specified, no. Since the global values are limited to enums, it still has the problem of reimplementing < using cfg(any(...)) I mentioned. But if it was extended to also support integers, that would indeed solve the problem.

Those sound like breaking changes which would imply controlling them with either system isn't a good fit.

I don't understand what you mean.

davidben commented 11 months ago

@davidben there is https://github.com/rust-lang/rust/issues/64797 though I think I heard they are wanting to start it with just std. That is considered a blocker before considering adding rustc version cfgs. I think we would likely want to go down the same route. I'm assuming doing so would resolve that use case. If so, I'd recommend creating an issue for that rather than hoping for this unrelated feature being extended to support it.

Oh, interesting. I guess I would call that unrelated mechanism for addressing a very related problem. But as long as it solves the problem, I'm happy. 😄 Tying it to symbol accessibility is interesting, but would work. Crates could define random constants that don't do anything useful, but are meant to be used with #[cfg(accessible(...))] queries. accessible is also a little verbose, but ah well.

I share @Arnavion's concern that cfg currently has no way to handle integers, which dramatically complicates things. In comparison, the C preprocessor, for all its warts, is perfectly capable of doing integer comparisons. Doing so is very, very common for version checks. The rust-openssl folks have a whole mess to work around this limitation.

But even booleans that don't require build.rs would be a dramatic improvement over the status quo. Still well behind C, but if extended to integers in the future, that would probably catch up.

Arnavion commented 11 months ago

Crates could define random constants that don't do anything useful, but are meant to be used with #[cfg(accessible(...))] queries.

Oh, good point. This would also work k8s-openapi's case, eg enabling v1_23 would define pub const V1_23_ENABLED: ();

So the lack of integers would be the only problem.

epage commented 11 months ago

As it is specified, no. Since the global values are limited to enums, it still has the problem of reimplementing < using cfg(any(...)) I mentioned. But if it was extended to also support integers, that would indeed solve the problem.

Integer support is in the future possibilities.

Arnavion commented 11 months ago

I know. I understood "Future possibilities" to mean "not going to be in the RFC and initial implementation", so as it's currently specified it does not support integers.

epage commented 11 months ago

That is correct but the point more is that it has the potential for covering the case and it would be good to give feedback based on that possibility to see if it is going in the right direction or not.