Tracking issue for RFC 1566: Procedural macros

aturon commented 7 years ago

Current Status

This issue has been closed in favor of more fine-grained tracking issues

~Updated Description~

Next steps:

[x] Stabilize use_extern_macros
- waiting on crater
[ ] Stabilize the proc_macro feature

Possible Stabilization Showstoppers

Original Description

RFC.

This RFC proposes an evolution of Rust's procedural macro system (aka syntax extensions, aka compiler plugins). This RFC specifies syntax for the definition of procedural macros, a high-level view of their implementation in the compiler, and outlines how they interact with the compilation process.

At the highest level, macros are defined by implementing functions marked with a #[macro] attribute. Macros operate on a list of tokens provided by the compiler and return a list of tokens that the macro use is replaced by. We provide low-level facilities for operating on these tokens. Higher level facilities (e.g., for parsing tokens to an AST) should exist as library crates.

Roadmap: https://github.com/rust-lang/rust/issues/38356#issuecomment-274377210.

Tasks

[x] Implement #[proc_macro_attribute] (PR #38842).
- [x] Fix #39347 (PR #39572).
- [x] Fix #39336 (PR #44528).
[x] Implement #[proc_macro] (PR #40129).
[x] Identify and collect uses of proc_macro_derives in the InvocationCollector (PR #39391).

[x] Support macro-expanded proc_macro_derive imports.

For example:

#[derive(Trait, OtherTrait)] struct S; // Both these derives should resolve
macro_rules! m { () => {
#[macro_use(Trait)] extern crate derives;
use derives::OtherTrait; // this kind of import is gated behind `#![feature(proc_macro)]`
} }
m!();

[ ] Expand items before expanding applied proc_macro_derives (PR #48465).
[x] Implement warnings for unused #[macro_use] imports (PR #39060).
[x] Refactor the parser to consume token trees (PR #39118).
[x] Clean up TokenStream in preparation for further refactoring (PR #39173).
[x] Remove TokenTree::Sequence (PR #39419).
[x] Use TokenStreams instead of Vec<TokenTree> in tokenstream::TokenTree's Delimited variant (PR #40202).
[x] Use Paths and TokenStreams in ast::Attributes (PR #40346).
- [x] Support nontrivial paths in attribute/derive macros (e.g. #[foo::bar], #[derive(foo::Bar)]).
[x] Include hygiene information with all tokens, not just identifiers (PR #40597).
[x] Implement a minimal API for proc_macro::TokenStream as outlined in the RFC (PR #40939).
- [x] Include source TokenStreams for interpolated AST fragments in Token::Interpolated tokens.
- [x] Include a TokenStream quoter proc_macro::quote! behind the proc_macro feature gate.
[x] Provide a way for proc_macro authors to create expansions that use items in a predetermined crate foo without requiring the macro user to include extern crate foo; at the crate root (PR #40939).
- [ ] Improve ergonomics.
[ ] Include source TokenStreams for items in the AST.
[ ] Stability check (proc-)macros (issue #34079).
[x] Allow proc macro to initialize a private field with a def_site value (issue #47311). (PR #48082)
[x] Inconsistency between accessing field of braced struct vs tuple struct in proc macro (issue #47312). (PR #48083)
[ ] Make std available to proc macro root in phase 1 (issue #47314).
[x] Improve error from invalid syntax inside proc_macro::quote! (issue #47315).
[ ] Inconsistency between Display and IntoIterator for a TokenStream containing a module (issue #47627).
[x] #[cfg_attr] makes .to_string() and TokenStream disagree (issue #48644).
[x] Wishlist for libproc_macro (checklist in #47786).

aturon commented 7 years ago

cc @nrc @jseyfried

abonander commented 7 years ago

I'd love for #[proc_macro_attribute] to be implemented soon. I already have a prototype and test usage that I banged out before realizing there's no compiler support yet :unamused: :

Prototype: https://github.com/abonander/anterofit/blob/proc_macro/macros/src/lib.rs Example/Test: https://github.com/abonander/anterofit/blob/proc_macro/examples/post_service_proc_macro.rs

jseyfried commented 7 years ago

Tasks

(dtolnay edit: moved the checklist up to the OP)

cc @nrc @petrochenkov @durka @Ralith

abonander commented 7 years ago

@jseyfried I ran into an issue where if a legacy macro and an attribute with the same name are imported into the same scope, trying to use the attribute throws an error that macros cannot be used as attributes. Can we make this work so that both can be in the same scope and can be used as intended?

jseyfried commented 7 years ago

@abonander All macros (bang, attribute, and derive) share the same namespace, so we can't use two different macros with the same name in the same scope. However, we could improve that error message -- could you open an issue?

SimonSapin commented 7 years ago

Sorry I’m late to the party. I’m happy with the direction to expose tokens rather than an AST, but I have some concerns about the specific TokenStream API proposed in the RFC:

pub enum TokenKind {
    Sequence(Delimiter, TokenStream),

    // The content of the comment can be found from the span.
    Comment(CommentKind),

    // `text` is the string contents, not including delimiters. It would be nice
    // to avoid an allocation in the common case that the string is in the
    // source code. We might be able to use `&'codemap str` or something.
    // `raw_markers` is for the count of `#`s if the string is a raw string. If
    // the string is not raw, then it will be `None`.
    String { text: Symbol, raw_markers: Option<usize>, kind: StringKind },

    // char literal, span includes the `'` delimiters.
    Char(char),

    // These tokens are treated specially since they are used for macro
    // expansion or delimiting items.
    Exclamation,  // `!`
    Dollar,       // `$`
    // Not actually sure if we need this or if semicolons can be treated like
    // other punctuation.
    Semicolon,    // `;`
    Eof,          // Do we need this?

    // Word is defined by Unicode Standard Annex 31 -
    // [Unicode Identifier and Pattern Syntax](http://unicode.org/reports/tr31/)
    Word(Symbol),
    Punctuation(char),
}

pub enum StringKind {
    Regular,
    Byte,
}

It’s not clear if this API was intended as a complete plan that was accepted when the RFC was merged, or just an example to be worked out later.

Generally, this seems far from the "normal" Rust syntax accepted by the compiler outside of macros. While some macros will want to parse some ad-hoc domain-specific language, others will want to parse "actual Rust" syntax and make sense of it.

(Minor) I don’t think Eof is necessary. An Iterator will presumably be used to, well, iterate over a TokenStream and Iterator::next already returns None to signal the end of iteration.
(Minor) I don’t think Exclamation, Dollar, or Semicolon are necessary. Matching on Punctuation('!') for example is not more difficult.
(Minor) As others have mentioned in the RFC PR, we might want to omit comments that are not doc-comments. (Any use case that wants to preserve comment likely wants to preserve whitespace too.)
As far as I can tell, what to do with multi-character operators (that probably should be a single token each) is still an open question. A possible solution is discussed in PR comments, but it looks like that didn’t make it into RFC text.
Number literals are missing. Are macros supposed to parse [Punct('1'), Punct('_'), Punct('2'), Punct('3'), Punct('4'), Punct('.'), Punct('5'), Punct('e'), Punct('6')] by themselves to evaluate a literal? They can’t even use str::parse::<f32> to do that, since the syntax it accepts is not the same as Rust literal syntax (which can have _ in the middle, for example).

I imagine that there’s a stability concern here. Can we introduce new numeric types like u128 / i128 (and possibly in the future f128, u256, …) and their literals, without breaking changes to the tokens API? One way to make this possible might be:
```
struct IntegerLiteral { negative: bool, decimal_digits: String, type_suffix: Option<String> }
impl TryInto<u32> IntegerLiteral { type Err = OutOfRange; /* … */ }
// Other impls for integer types supported in this compiler version

// Something similarly for floats
```
Or maybe something else. But I don’t think "pretend numbers don’t exist" is a good way to do it.
// Word is defined by Unicode Standard Annex 31 -

This definition needs to be more precise than that. UAX 31 specifies a couple different variations of identifier syntax, and none of them is called "word". But choosing which exact variation we want is why non-ASCII identifiers are feature-gated at the moment.

Instead, I think this should be defined as "whatever the current compiler accepts as an identifier or keyword" (which can change per #28979). Maybe with a pub fn is_identifier(&str) -> bool public API in libmacro.
Unicode strings and byte string literals share a single token variant, which I think is wrong as the memory representations of their values have different types (str vs [u8]). It’s also not clear if the text: Symbol component is intended to be a literal slice of the source code or the value after resolving backslash escapes. I think it should definitely be the latter. (For comparison, Char(char) has to be the latter since \u{A0} takes more than one char to represent literally.)

porky11 commented 7 years ago

another way to write high level macros would be using lisp like macros, but this would need a s-expression representation for the whole rust ast.

jan-hudec commented 7 years ago

@SimonSapin,

As others have mentioned in the RFC PR, we might want to omit comments that are not doc-comments. (Any use case that wants to preserve comment likely wants to preserve whitespace too.)

Please don't. I have a use-case where I want to use (though not preserve—they will be written into a separate compilation product instead) comments in the syntax.

Specifically, I want to create translation macros that would load translations of a string from a separate source file(s) and I would like to generate a list of strings to be translated as by-product in debug build. And there needs to be a way to include comments to be emitted into that list (rust-locale/rust-locale#19). So it makes sense to use comment syntax and the macro needs to see them.

I agree with the other points in that post.

jseyfried commented 7 years ago

@jan-hudec Even if we didn't have a TokenKind::Comment, you could still use comments by looking at the contents of the spans between consecutive tokens.

I don think we shouldn't have TokenKind::Comment to encourage procedural macros to ignore comments so that users are free to add comments to macro invocations without worrying about changing semantics.

abonander commented 7 years ago

@jan-hudec Is there a reason attributes won't work with your solution?

jan-hudec commented 7 years ago

@abonander, attributes absolutely don't make sense. Translatable strings act as literals, not as items. But extracting them during compilation would be just for convenience—it can always be done as separate parsing (and in fact, may end up being so, because I need to see all of them in the crate and incremental compilation would break that).

aidanhs commented 7 years ago

I want to make a procedural macro that's based on serde's derive (and so calls the serde tokenstream functions directly) but there's no way to say I want to consume serde derive as a library rather than a procedural macro. This isn't exclusive to derive macros, I can see a similar thing being wanted for 'normal' procedural macros too.

My only solution right now appears to be forking serde_derive.

The the problem is this error message from rustc:

error: the `#[proc_macro_derive]` attribute is only usable with crates of the `proc-macro` crate type

It's easy to remove that and make it work, but there is also some complexity that I'm not sure how to resolve - a procedural macro crate could plausibly want to both use the proc-macro derive from another procedural macro crate, as well as calling the functions to generate the derive for a downstream user. What would that look like? Is there anything similar like this around at the moment, where a crate can be linked to in two different ways at the request of the consuming crate?

jseyfried commented 7 years ago

@aidanhs

a procedural macro crate could plausibly want to both use the proc-macro derive from another procedural macro crate, as well as calling the functions to generate the derive for a downstream user. What would that look like?

You can't access the functions (or anything else besides procedural macros) from a proc-macro crate. If you want to use TokenStream -> TokenStream functions and the corresponding procedural macros, you'll need to put the TokenStream -> TokenStream functions in a separate, non-proc-macro crate, and then also have a proc-macro crate that just delegates to those functions.

jseyfried commented 7 years ago

This RFC will be mostly implemented once #40939 lands.

lambda-fairy commented 7 years ago

Provide a way for proc_macro authors to create expansions that use items in a predetermined crate foo without requiring the macro user to include extern crate foo; at the crate root

Suppose that I want to present a single crate, that contains both non-macro items and a procedural macro that refers to said items. When #40939 lands, will this three-crate pattern be the idiomatic way to achieve this goal?

Put all the non-macro items in foo_runtime
Implement the procedural macro in foo_macros, referring to the symbols in foo_runtime as necessary
Add a final "façade" crate foo that pub uses the items from foo_runtime and foo_macros
- This is the only crate that the user will import directly
- This works because the hygiene system fixes the macros to point to the right crate

I ask because my use case involves importing two crates, and it would be great for usability if I could get away with just one.

jseyfried commented 7 years ago

@lfairy I think a "two-crate" pattern will be the idiomatic way:

Put all non-macro items in foo

Implement the procedural macro in foo_macros, referring to symbols in foo as necessary, e.g.

#[proc_macro]
fn m(_: TokenStream) -> TokenStream {
quote! {
    extern crate foo; // due to hygiene, this is never a conflict error
    foo::f();
    // --- or just --- (if/when we get the sugar)
    $universe::foo::f();
}
}

pub use items from foo_macros in foo.

This works because the hygiene system fixes the macros to point to the right crate

Reexporting a procedural macro in a different crate does not affect how names from the procedural macro resolve.

colin-kiegel commented 7 years ago

@jseyfried: Do you know whether this re-exportation trick also works with custom derives? Because these crates have exactly the same limitation of not being able to export any items.

jseyfried commented 7 years ago

@colin-kiegel Custom derive crates are just proc macro crates that happen to only have #[proc_macro_derive]s. With #[feature(proc_macro)], you can re-export custom derives in ordinary crates, just like you can re-export other proc macros.

aidanhs commented 7 years ago

@jseyfried I'm aware of the situation as it is at the moment, I posed the question because I don't think it's ideal and hoped to have a discussion about it. In the situation you describe, 'delegating'/reusing to the procedural macros of another crate becomes a matter of convincing the macro author (in this case, serde) to split their procedural macros into two crates. If I can call procedural macros like normal functions, the upstream crate author doesn't even need to know I'm using their crate.

That said, I recognise the compatibility hazard - the exact tokentree generated by a macro becomes part of the stable interface, so if serde changes how they generate Derive in a patch version and I've written a fragile macro, my macro will be broken for every single new user of my crate (as opposed to a fragile macro in the current case, where in the worst case it'll only work for specific inputs, but consistently).

arielb1 commented 7 years ago

@jseyfried

Does this pull foo from the current cargo deplist? That sounds bad (i.e. will it do anything particularly stupid if there are 2 crates named foo linked into the current binary?).

jseyfried commented 7 years ago

@aidanhs That would be a major language change/addition that would warrant its own RFC.

jseyfried commented 7 years ago

@arielb1

Does this pull foo from the current cargo deplist? That sounds bad

Yeah -- sadly, quoted extern crate names aren't hygienic, i.e. the resolution depends on which crate names happen to be in scope where the procedural macro is used. We can mitigate this using the re-export trick (i.e. re-exporting foo_macros in foo so that we know foo will be in scope), but that doesn't protect against ambiguity errors when there are two crates named foo.

I think the best solution here is to add phase 1 (i.e. target w.r.t. host vs target) dependencies to the Cargo.toml for proc-macro crates via a --target-extern command line argument. This would allow us to explicitly list the extern crate names in scope inside quote!.

arielb1 commented 7 years ago

@jseyfried

The idea is that a proc-macro crate would have a dependency in its "target" metadata, right?

jseyfried commented 7 years ago

@arielb1 Yeah, exactly.

bstrie commented 7 years ago

This RFC will be mostly implemented once #40939 lands.

@jseyfried As in, ready to be stabilized when that PR lands? If not, what would remain blocking stabilization? I just don't want this to be yet another feature where it feels like we get 95% of the way towards implementing and people get all excited, and then things peter off anticlimactically.

jseyfried commented 7 years ago

As in, ready to be stabilized when that PR lands?

No, we want to get some experience with the API before stabilizing and perhaps future proof hygiene for extern crate names (i.e. address this issue that @arielb1 pointed out).

We will probably want to make breaking changes to this API; @eddyb has proposed/considered generalizing OpKind to all token trees. Also, we might change how we handle doc comments, floating point literals, etc. Overall, the API in this PR isn't mature enough to consider stabilizing yet.

est31 commented 7 years ago

@bstrie sadly the RFC to fast track proc macro stabilisation (with a limited api where e.g. token streams are only accessible through their string representation) like the derive macro stabilisation has failed: https://github.com/rust-lang/rfcs/pull/1913

jseyfried commented 7 years ago

@est31 Postponed, more like -- after a little experience with this API we might agree on a subset that we can agree to fast-track to stable.

The String-based API interacts badly with declarative macros 2.0 and is already limiting today, even without macros 2.0 and just with #[derive]s. We want to avoid proliferation of the String based API as much as possible to avoid issues as people migrate to macros 2.0.

alexcrichton commented 7 years ago

I've opened an issue for #[proc_macro_attribute] seemingly not getting expanded on trait methods (maybe trait items in general?)

alexcrichton commented 7 years ago

Since this is now the tracking issue for the proc_macro crate and its new APIs I thought I'd write down some thoughts as well. I've published a crate called proc-macro2 which is intended to be the exact same as the proc_macro crate in-tree except that it provides the ability to compile on stable Rust. It then also has the ability to use a feature to compile on nightly Rust to get the benefit of better span information. That library is intended to become the foundation for other libraries like syn, and in the development of syn we found a few shortcomings we may wish to address in proc_macro directly:

There's no Literal constructor for a few sorts of literals. This is worked around via stringification followed by parsing, but it'd be great to be able to construct these directly without having to go through the string API.
- Raw strings - r###" foo "###
- Raw byte strings - rb#" foo "#
- Byte literals - b'x'
- Doc comments - these are currently represented as the Literal token.
There's no way to inspect a Literal and extract its value. Right now we're relying on literalext crate to to_string a literal and re-parse it, but this information in theory is already stored in the Literal and it'd be nice to be able to access it.
The mapping of tokens in some cases can be construed as a bit odd. Namely right now doc comments map to the Literal type.

I believe all other concerns starting here have since been addressed.

Arnavion commented 7 years ago

I encountered a breakage when testing with #![feature(proc_macro)] that affects custom derives that have #[proc_macro_derive(foo, attributes(foo))]. That is, a custom derive which has the name of an attribute that is the same as the custom derive. One such crate is mine - derive-error-chain, which has #[derive(error_chain)] #[error_chain(...)] struct ErrorKind { ... }. Another is derive-new, which has #[derive(new)] #[new] struct S;. I don't know if there are others.

For code like this, the compiler complains at the second attribute that "foo" is a derive mode. Is this intentional or can it be fixed? If intentional I need to prepare to rename my custom derive to ErrorChain or something.

jseyfried commented 7 years ago

@Arnavion This was intentional in general -- since proc_macro_attributes must be expanded before derives, if new were proc_macro_attribute then the expansion would be ambiguous. It would be possible to specifically allow new to be a proc_macro_derive, but I'm not sure it's worth it (also could be a future-compatibility hazard).

Arnavion commented 7 years ago

This was intentional in general -- since proc_macro_attributes must be expanded before derives, if new were proc_macro_attribute then the expansion would be ambiguous.

Okay, I'll rename #[derive(error_chain)] to #[derive(ErrorChain)].

It would be possible to specifically allow new to be a proc_macro_derive, but I'm not sure it's worth it (also could be a future-compatibility hazard).

Sure, I wasn't asking for new to be special-cased. It was just an example from one of the two proc_macro_derives I know about that are broken by this.

jseyfried commented 7 years ago

@Arnavion Sorry, my last comment wasn't the clearest -- I didn't mean special-case new specifically but to allow #[derive(some_macro)] #[some_attr] struct S; when some_attr resolves to a proc_macro_derive. When some_attr resolves to a proc_macro_attribute, this would need to be an ambiguity error; today, it is an ambiguity error if some_attr resolves to any macro.

Arnavion commented 7 years ago

Yes, I got it.

LukasKalbertodt commented 7 years ago

(I hope this is the right place for a question like this.)

What's the status of this?

[ ] Provide a way for proc_macro authors to create expansions that use items in a predetermined crate foo without requiring the macro user to include extern crate foo; at the crate root (PR #40939).

The PR has landed, but the box is still not checked. @jseyfried mentioned something here and it kinda seems to work. However it doesn't seem to work with use at all:

let call_site_self = TokenTree {
    kind: TokenNode::Term(Term::intern("self")),
    span: Span::call_site(),
};
quote! {
    extern crate foo; // due to hygiene, this is never a conflict error

    // Neither of those works    
    use foo::f;
    use self::foo::f;
    use $call_site_self::foo:f;
}

Am I missing something? What is the idiomatic way to use symbols from an extern crate imported in the macro?

parched commented 7 years ago

You can't use use see https://github.com/rust-lang/rfcs/issues/959. But for a macro it's not really a disadvantage to use the fully qualified path everytime. (Except for traits, I think)

LukasKalbertodt commented 7 years ago

@parched Thanks for linking this other issue. My use case was the following:

In my macro I want to let the user write something similar to a match-matcher. Specifically, the user writes a Term and this can either be a variant of an enum or a simple variable name which binds the match value. To write some pseudo code with macro_rules! syntax:

macro_rules foo {
    ($matcher:ident) => {
        match something() {
            $matcher => {}
            _ => {}
        }
    }
}

Now I want the user to be able to just specify the variant name without the enum name. Thus I would insert a use my_crate::AnEnum::*; statement in the generated code. But since this is not possible (right now), I need to check for myself whether or not the $matcher is a variant of the enum or not.

I hope my explanation is understandable. I just wanted to give another use case for use in macro-generated code.

jseyfried commented 7 years ago

@LukasKalbertodt ~~Could you just use my_crate::AnEnum::$matcher => {} in the match?~~ Nevermind, I your the issue -- I believe we'll need https://github.com/rust-lang/rfcs/issues/959 for that.

LukasKalbertodt commented 7 years ago

@jseyfried No: $matcher can either be a variant name (in which case your solution would work) or a simple variable name like in match x { simple_var_name => {} }. In the latter case it wouldn't work AFAICT. (btw, I just wanted to mention another use case to show that using use is important)

Arnavion commented 7 years ago

@jseyfried

This was intentional in general -- since proc_macro_attributes must be expanded before derives, if new were proc_macro_attribute then the expansion would be ambiguous.

Okay, I'll rename #[derive(error_chain)] to #[derive(ErrorChain)].

It seems attributes of custom derives also conflict with macro_rules macros, instead of overriding them like custom derives do based on import order. That is, this code compiles:

#![feature(proc_macro)]
#[macro_use] extern crate error_chain; // macro_rules! error_chain
#[macro_use] extern crate derive_error_chain; // #[proc_macro_derive(error_chain, attributes(error_chain))]

#[derive(error_chain)] // No error. Resolves to custom derive
enum ErrorKind {
    /*#[error_chain]*/ // (1) As discussed above, can't use this any more since it conflicts with the name of the custom derive
    Foo,
}

This matches the behavior of current stable Rust, with the exception that (1) does work in stable. I've even explicitly documented that users wishing to use #[macro_use] with the error-chain crate will need to import it before importing derive-error-chain.

But even if I rename the custom derive to ErrorChain to make (1) work with the proc_macro feature (which is already one breaking change for stable code):

#![feature(proc_macro)]
#[macro_use] extern crate error_chain; // macro_rules! error_chain
#[macro_use] extern crate derive_error_chain; // #[proc_macro_derive(ErrorChain, attributes(error_chain))]

#[derive(ErrorChain)] // Unique name, so no error
enum ErrorKind {
    #[error_chain] // (2)
    Foo,
}

it still doesn't compile - the attribute at (2) yields the error: macro `error_chain` may not be used in attributes because the macro_rules macro apparently conflicts with the attribute registered by the custom derive instead of being overridden like in the first case.

So I have to rename both the custom derive and its attribute. The attribute gets used a lot more (one on each variant of the enum) than the custom derive (one on each enum), so this is a bigger breaking change than I expected. I do understand that this is a tricky situation of my own construction (reusing the name of the macro_rules macro for a custom derive and its attribute), but this is also code that has been compiling in stable since custom derives were stabilized, so I had no reason to think it would be a problem six months later.

Can it perhaps be made so that attributes of custom derives override macro_rules macros just like custom derives themselves override macro_rules macros? Actually I don't see how there could be any ambiguity between them, but I assume it's the same reason as when a macro_rules macro is imported after a custom derive of the same name - that all macros are put in the same namespace without considering what kind of macro they are.

LukasKalbertodt commented 7 years ago

Is there some less formal "place" to talk about proc macros? Like a #rust-proc-macro IRC channel? I'd love to ask small questions about the feature from time to time, but it just feels wrong to spam this thread :see_no_evil: And in the #rust channel, most people haven't worked with proc-macros and especially the new proc_macro API (since it's unstable and all). So: any idea where to discuss this topic?

abonander commented 7 years ago

@LukasKalbertodt #rust-internals, maybe, or just start a new thread on /r/rust.

SimonSapin commented 7 years ago

TokenStream::from_str panics when used outside of a procedural macro (for example in a build script):

thread 'main' panicked at 'proc_macro::__internal::with_sess() called before set_parse_sess()!', /checkout/src/libproc_macro/lib.rs:758:8

Would it be possible/desirable to replace this panic with implicitly creating a dummy "session"? Or perhaps add a public API (with a path to stabilization) to create one?

CinchBlue commented 7 years ago

Has anyone looked at literature on macros from other systems? I would like to hear people's thoughts on this. I'll speak about Scheme here, since that's what I'm most familiar with.

I am personally working on implementing syntax-rules for R7RS Scheme on my own project, and I have found that syntax-case can form the basis for supporting both unhygienic and hygienic macro systems (defmacro and syntax-rules). GNU Guile does this. syntax-case also has support for fenders which can perform additional predicate validation on syntax object lists (or, something among the lines of TokenStream in Scheme). I can see that Mark is being worked on, and it looks like it's inspired by Bindings as Sets of Scopes.

Also, should we also discuss whether arbitrary computation at compile-time should be supported? Racket actually takes an entire "phase" approach to things, it appears like, with begin-for-syntax allowing for definitions and computation (?) at the compile-time level during macro expansion..

Control over hygiene is very possible with (datum->syntax <thing-to-copy-scope-from> <thing-to-apply-scope-to>) in Scheme, allowing you to escape the scope of a macro and instead take on a scope of an object outside of the immediate scope.

Take this example from The Scheme Programming Language, 3rd ed. by R. Kent Dybvig (Chez Scheme, now at Cisco Systems): http://www.scheme.com/tspl3/syntax.html. The example shows (include "filename.scm") as a syntax-case macro, and allowing the interpreter to use a macro to setup the runtime to read from a file and continue evaluation. The deeper question here is whether we want a macro-macro system to allow such things to happen at macro-expansion time, and trigger compile-time computations such as triggering a file import (although, this appears to occur in the direct compiler functionality, so maybe we don't want to do this).

What should the limits of macros be? I would imagine that Rust, wanting to cut down on its compilation time, wants to restrict compile-time evaluation (and especially avoid infinite loops). Racket has taken the "tower of preparers and expanders" approach with phases as referenced in Lisp in Small Pieces. Do we want to allow things like allow access to a compile-time API to perform file I/O and limited recursive computation? Should we allow things like have procedural macros be able to turn CSV spreadsheet specs into switch statements?

I'd love to hear about other systems! I hear Template Haskell has an interesting approach with well-defined types to represent their AST, and laziness in Haskell can replace many uses of macros for control structures.

Sorry if I'm stepping out of line.

jan-hudec commented 7 years ago

What should the limits of macros be?

For procedural macros, discussed in this issue, none. A procedural macro is a compiler extension. It can take a bit of C++ code, run it through clang and add the resulting object to the compilation. It can take some SQL, query the database to find corresponding result type and generate appropriate result set. Those are actual use-cases people want to do!

Note, that Rust has another macro system. It's update was approved as RFC 1584 and implementation is tracked by https://github.com/rust-lang/rust/issues/39412.

jan-hudec commented 7 years ago

@VermillionAzure, from quick look at the Scheme forms you referenced:

The macro_rules, and their update per RFC 1584, is similar to syntax-rules. If you have suggestions for enhancing those, https://github.com/rust-lang/rust/issues/39412 is probably the best place to discuss that.

The proc-macros, about which this issue is, are like the general form of define-syntax. And this RFC (1566) very intentionally does not define anything like the syntax-case. Only an interface to call a function to transform the token stream.

The interface is defined in such a way that something like syntax-case can be implemented in a separate crate (library) and the intention is to do it that way. If you are so inclined, feel free to play around. Both any prototype and report on how easy or hard to use the API is will certainly be welcome.

porky11 commented 7 years ago

I think the idea was, defining macros like functions, like in lisp, but having a macro, which returns the macro, that macro_rules! defines.

So following would be equivalent:

    macro_rules! foo {/*define macro here*/}

#[proc_macro]
pub fn foo(tokens: TokenStream) -> TokenStream {
    macro_case! tokens {/*define macro here*/} //takes `tokens` as first argument, returns a `TokenStream`
}

That's how syntax-rules and syntax-case seem to work in scheme.

@VermillionAzure Is this, what you would like?

CinchBlue commented 7 years ago

@porky11 No, it doesn't seem so. I just wanted to see if Scheme macros would be a relevant idea to add to the discussion -- it is obvious that since procedural macros are intended to be much more powerful than the syntax-case macro system in Scheme that it is trivial to implement all of the macro systems in terms of the arbitrary power provided here.

@jan-hudec Is it wise to allow any arbitrary computation as a compiler extension without any sort of security guarantee? I am absolutely floored by the idea that procedural macros are going to be so powerful here, but would any potential users of Rust consider this a downside to using packages? I'm not a security expert by any means, but couldn't vulnerabilities in any libraries that are used within compiler extensions could easily be used to maliciously turn the Rust compiler into an attack vector? Additionally, if any bugs occur in libraries used in procedural macros (e.g. segfault triggered by bad C library code), does this mean that the segfault would trickle up and make the compiler fail without proper error messaging?

Would there be a way to encapsulate errors that occur in procedural macros in a way that it would not affect any others parts of the compiler?

CinchBlue commented 7 years ago

Another idea: when do procedural macros execute? If procedural macros can interact with code that has side effects that could be relevant (e.g. communicating with a stateful external server, mutating an external SQL database, getting a security key to log to an external system), then doesn't that mean that the order in which procedural macros are triggered by the compilation process is important?

rust-lang / rust