Proc macros: ability to refer to a specific crate/symbol (something similar to `$crate`)

LukasKalbertodt commented 6 years ago

The problem

In macros-by-example we have $crate to refer to the crate the macro is defined in. This is very useful as the library author doesn't have to assume anything about how that crate is used in the user's crate (in particular, the user can rename the crate without breaking the world).

In the new proc macro system we don't seem to have this ability. It's important to note that just $crate won't be useful most of the time though, because right now most crates using proc macros are structured like that:

foo-{macros/derive/codegen}: this crate is proc-macro = true and defines the actual proc macro.
foo: defines all runtime dependency stuff, has foo-{macros/derive/codegen} as dependency and reexports the proc macro.
The important part: the proc macro emits code that uses stuff from foo

An example:

**`foo-macros/src/lib.rs`** ```rust #[proc_macro] pub fn mac(_: TokenStream) -> TokenStream { quote! { ::foo::do_the_thing(); } } ``` **`foo/src/lib.rs`** ```rust pub fn do_the_thing() { println!("hello!"); } ``` When the user uses `mac!()` now, they have to have `do_the_thing` in scope, otherwise an error from inside the macro will occur. Not nice. Even worse: if the user has a `do_the_thing` in scope that is not from `foo`, strange things could happen.

So an equivalent of $crate would refer to the foo-{macros/derive/codegen} crate which is not all that useful, because we mostly want to refer to foo. The best way to solve this right now is to use absolute paths everywhere and hope that the user doesn't rename the crate foo to something else.

The proc macro needs to be defined in a separate crate and the main crate foo wants to reexport the macro. That means that foo-macros doesn't know anything about foo and thus blindly emits code (tokens) hoping that the crate foo is in scope.

But this doesn't sound like a very robust solution.

Furthermore, using the macro in foo itself (usually for testing) is not trivial. The macro assumes foo is an extern crate that can be referred to with ::foo. But that's not the case for foo itself. In one of my codebases I used a hacky solution: when the first token of the macro invocation is *, I emit paths starting with crate:: instead of ::foo::. But again, a better solution would be really appreciated.

How can we do better?

I'm really not sure, but I hope we can use this issue as place for discussion (I hope I didn't miss any previous discussion on IRLO).

However, I have one idea: declaring dependencies of emitted code. One could add another kind of dependencies (apart from dependencies, dev-dependencies and build-dependencies) that defines what crates the emitted code depends on. (Let's call them emit-dependencies for now, although that name should probably be changed.) So those dependencies wouldn't be checked/downloaded/compiled when the proc macro crate is compiled, but the compiler could make sure that those dependencies are present in the crate using the proc macro.

I guess defining those dependencies globally crate is not sufficient since different proc macros could emit code with different dependencies. So maybe we could define the emit-dependencies per proc macro. But I'm not sure if that makes the check too complicated (because then Cargo would have to check which proc macros the user actually uses to collect a set of emit-dependencies).

That's just one idea I wanted to throw out there.

Obviously this isn't immediately workable, because of the possibility of circular dependencies. So we would need some kind of multi phase compilation. But I think, without much knowledge of the details, this might be viable without being too intrusive to the compiler. Specifically I propose the following:

There are two named phases to compiling a crate: for the sake of an argument, call them macro and final.
There is a phase attribute available to enable conditional compilation. It accepts a comma-separated list of phases and is inherited through scopes unless overridden, with a default of#[phase(final)]. But there can also be a phase key for cfg in order to cover use cases that the new attribute doesn't.
1. Optionally, for ergonomics, items not compiled in the current phase participate in name resolution, but it is an error to refer to them.
Proc macro declarations ignore the inherited phase attribute and can't have an explicit phase marking of their own. They always declare a macro into their immediate scope (in the macro namespace of course), but exact behaviour depends on phase.
1. In the macro phase, the declared macro is simply an error to call, and the definition is used to compile the macro. The placeholder macro exists to produce better error messages than "name doesn't exist" and to avoid accidentally invoking a macro that you didn't realize was imported from a glob import and that would have been shadowed. (Note: this can probably be done with minimal compiler support by having the macro phase version be a normal macro that just expands into a compiler_error! invocation?)
2. In the final phase, the proc macro definition is ignored, and the declare macro name refers to the macro compiled during the macro phase, as by a use. The visibility of the macro name is determined by the visibility of the function defining it.
For hygiene, def-site spans in a proc macro are considered to be in the final version of the crate, and therefore can refer to other names in the crate.
1. For convenience, the proc_macro crate provides a quote_local_path! macro, implemented via intrinsic, that accepts a local path and produces a hygienic, implicity-delimited TokenStream referring to that path. Possibly the quote! macro could also have a syntax for this.

lovasoa commented 2 years ago

Another problem that arises from this issue that I think hasn't been mentioned here is when the proc-macro is re-exported. If we have crate a which contains a proc-macro, and a crate b that depends on a and re-exports the macro (with pub use a::my_macro), then the code that depends on b will not have the ::a in scope, and this will result in hard-to-troubleshoot issues, since users of b don't even know about crate a.

jhpratt commented 2 years ago

Completely forgot about this issue. Basically there were two reasonably significant issues with the approach I wanted to take that would need to be addressed for any solution.

How are ambiguities between multiple versions resolved? It is perfectly legal to have multiple (incompatible) versions of a crate as dependencies.
Ditto for crates of the same name from different registries? While crates.io is by far the most common, different registries can have crates with identical names and versions, but not underlying code.

My personal view is that the former is trivially solvable: allow an optional version to be specified. The latter is quite a bit more difficult, and I have no solutions to propose.

lovasoa commented 2 years ago

@jhpratt I think I don't understand where the problem is. If a proc macro could use $crate, it would refer to its accompanying library crate, which is unique.

SimonSapin commented 2 years ago

Is it unique? As far as I understand "accompanying library crate" is not a concept that really exists in rustc or ever cargo. It’s just a convention.

jhpratt commented 2 years ago

What if the proc macro is re-exported from a third crate? The problem is not as simple as you'd think. What if you want to reference some arbitrary crate? I have a proc macro that needs ::serde, but that's not my crate.

Nemo157 commented 2 years ago

Something I've long thought about is the ability to explicitly declare runtime-dependencies for a proc-macro. That would forward-declare a crate that the macro will generate code to reference (and so avoid dependency loops), and somehow give it some TokenTree through which it can refer to the crate in the generated code.

EDIT: It is possible to support multiple crates even if there is a single $crate though, you just need to have your runtime crate re-export all the other dependencies you need.

alercah commented 2 years ago

All that's needed is hygiene to ensure that crates are looked up in the context of the macro, not the call site.

On Mon., Sep. 5, 2022, 02:40 Jacob Pratt, @.***> wrote:

What if the proc macro is re-exported from a third crate? The problem is not as simple as you'd think. What if you want to reference some arbitrary crate? I have a proc macro that needs ::serde, but that's not my crate.

— Reply to this email directly, view it on GitHub https://github.com/rust-lang/rust/issues/54363#issuecomment-1236593621, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7AOVJHOK4NDZCV5GQJEHDV4WIU3ANCNFSM4FWCMO4Q . You are receiving this because you commented.Message ID: @.***>

Nemo157 commented 2 years ago

It's not just hygiene. The proc-macro is built for a different target, so it will have to do something similar to how a crate depending on a proc-macro crate implicitly creates a different kind of dependency edge. And having it just naïvely depend on the runtime crate creates dependency loops if the runtime crate then depends on the macro crate to re-export it. That's why I think it needs some way to do forward declaration of dependencies between cargo and rustc for these "dependencies, but not really".

alercah commented 2 years ago

Yeah, that's the hard part. The name lookup is the part that is not hard.

WaffleLapkin commented 2 years ago

So, runtime-dependencies would create a pseudo-crate which depend on all the runtime-dependencies and make $crate output by the macro refer to it? That sounds nice, although the pseudo-crate will still need special handling all the way down, since nothing can depend on it (otherwise there will be a cycle if it depends on something).

Another way is to move macro definition to the normal crate. A hacky idea that comes to mind is something like

reexport_macro_setting_dollar_crate_to_here! { the_crate_macros::the_macro }

i.e. add a built-in macro that creates a new macro setting $crate to the current crate. This also solves

To some extent this can be simulated even on stable, I think:

// main crate, that reexports the macro
macro_rules! the_macro {
    // `$crate` is expanded early it seems, and can be reinterpreted as a path
    // I've tested with `macro_rules!` and this still works if this macro is used from outside
    ($($args:tt)*) => { $crate::the_crate_macros::the_macro!($crate; $($args)*) }
}

// macro crate
#[proc_macro]
pub fn make_answer(ts: TokenStream) -> TokenStream {
    let krate = ts.parse_path(); // pseudo-code
    _ = ts.parse_semicolon();

    // ...
}

But, this is very limited -- for attribute and derive macros this won't work, as there is no syntax to define them in the normal crate.

A less hacky way would require to define macro sub-crates in-tree, which I think was discussed in zulip. But that's a lot bigger feature I think.

Nemo157 commented 2 years ago

To some extent this can be simulated even on stable, I think

I can confirm this works just fine for expression macros, I use it in stylish to get a re-exportable format_args! proc-macro. (It doesn't actually get expanded early, it gets passed in as a $crate ident which obeys its hygiene to determine which crate it refers to, you can get arbitrary crate access from one $crate token by changing its span).

lovasoa commented 2 years ago

Yes, this is a solution for proc macros, but it doesn't work for derive macros, does it ?

WaffleLapkin commented 2 years ago

But, this is very limited -- for attribute and derive macros this won't work, as there is no syntax to define them in the normal crate.

But rustc could do a similar(-ish) trick for them, I think. We "just" need to design and implement it.

dtolnay commented 2 years ago

https://github.com/rust-lang/rust/issues/54363#issuecomment-1236685409 is along the lines of what I would want in pretty much all of my macro libraries. Something like:

# Cargo.toml

[package]
name = "serde_derive"

[lib]
proc-macro = true

[dependencies]
proc-macro2 = "1"
quote = "1"
syn = "1"

[build-dependencies]
autocfg = "1"

[dev-dependencies]
serde = "1"
trybuild = "1"

[macro-dependencies]
serde = "1"

// src/lib.rs

use proc_macro::TokenStream;

#[proc_macro_derive(Serialize)]
pub fn derive_serialize(input: TokenStream) -> TokenStream {
    let serde /*: proc_macro::Ident */ = proc_macro::dependency("serde");
    quote! {
        impl #serde::Serialize for …
    }
}

In terms of the Cargo build graph, this does not say serde needs to finish (or even start) building before serde_derive can start building, unlike ordinary dependencies. It says serde needs to finish building (the rmeta, not necessarily codegen) before anything that depends on serde_derive can begin building, except serde itself:

If serde depends on serde_derive and calls this macro (it doesn't, but let's pretend) then the Ident that gets returned by proc_macro::dependency("serde") inside that expansion needs to behave just like a $crate that came from a macro_rules inside serde would behave.
If some other crate depends on serde_derive (directly or transitively) and calls this macro, the Ident is as though the downstream crate had its own direct dependency on __unnameable = { package = "serde", version = "1" } and obtained a $crate from it.

The discussion above about "what if multiple versions" and "what if different registries" doesn't seem applicable to this solution. The macro-dependencies describes a particular version just as a dependency or dev-dependency would do, and implicitly or explicitly a registry, and integrates nicely with Cargo patch. For example [patch.crates-io] serde = { path = "…" } would apply to that macro-dependency exactly as it would apply to an ordinary dependency.

jhpratt commented 2 years ago

@Nemo157 @dtolnay Love it. Both questions/problems I stated are inherently resolved by using Cargo.toml, which is something I honestly never considered.

blueforesticarus commented 1 year ago

Based on this discussion over def_site in proc macros. https://github.com/rust-lang/rust/issues/54724#issuecomment-867953306

Would it be appropriate for @dtolnay's macro-dependencies suggestion to put the dependencies into the def_site namespace?

Kixunil commented 1 year ago

Note that the same problem occurs in build script dependencies. prost/prost-build, tonic/tonic-build, configure_me/configure_me_codegen... I think both need to be solved and probably doing it the same way is the simplest option.

rust-lang / rust

Proc macros: ability to refer to a specific crate/symbol (something similar to `$crate`) #54363

The problem

How can we do better?

Related