Open alexcrichton opened 6 years ago
I wanted to shed some pain this is causing with tarpaulin.
Tarpaulin has worked amazingly well for me and my company as a replacement for kcov which historically been a pain to get accurate, reliable and correct results. At the moment tarpaulin stands as the most promising goto option for codecoverage in rust and just feels more like the only first class tooling option.
Having one of those when choosing to invest in a technology is important for many company's adoption story for checking off code quantity checkmarks. When they see that rust doesn't have a reasonable code quality story that works on stable rust, that's can result in a "pass" rather than "I'll take it". There are currently some work arounds for making this work-ish on stable but it feels very much like the story serde was in a year or so ago when you wanted to show all your friends how amazing serde was but then was embarrassed to show what it took to make work on stable because of a macro stabilization blocker.
With procedural macros having reached a point where they're very useful on stable, I expect many users will find themselves needing access to this information. Would it be reasonable to only stabilize parts of the Span
API that are not too risky? Perhaps exposing a function that optionally returns the path of the file where a macro is invoked if such a file exists?
I have a concern about exposing LineColum
information. It looks like it could play badly with incremental compilation, especially in the IDE context.
My understanding is that, if one adds a blank line to the start of the file, the line_column information of the input spans to proc macro changes. That means that IDE would have to re-expand procedural macros even after insignificant white space changes.
I would feel more confident if proc-macros were strictly a pure function from the input token stream to output token stream. This can be achieved, for example, by making line-column infocation relative to the start of macro invocation (as opposed to relative to the start of the file).
I don't feel that exposing absolute position is necessary the end of the world: IDE can track if a macro actually makes use of the info, or it can supply macro with some fake line/columns. But it's hard to tell if such hacks would work well in practice, and it's hard to experiment with them for the lack of IDE which handled proc-macros incrementally....
If the parser allocated IDs to every AST node, then, and this is the hard part, when an edit was made to the source, the parser tried to keep those IDs the same in all the non-edited code and only allocate new IDs for new nodes, that would allow spans to be kept completely separate from the AST. Those IDs could be passed through macro expansion without causing unnecessary invalidations. If something needed a span later on, it could then go back and ask the parser for the span for that particular AST node ID. I feel like having an incremental parser is important, not because parsing is the bottleneck, but because it underpins everything else.
@davidlattimore this is fascinating, but slightly off-topic for the issue. I've created a thread on internals: https://internals.rust-lang.org/t/macros-vs-incremental-parsing/9323
The column!() macro as well as std::panic::Location::column are returning 1-based columns while the span available from the proc-macro crate is 0-based according to its docs. Is this inconsistency intended?
This thread has more discussion about 1-based columns: https://github.com/rust-lang/rust/pull/46762#issuecomment-352474639
Another open question is how this API relates to https://github.com/rust-lang/rust/issues/47389 which is about minimizing span information throughout the compiler. Should stabilization be blocked until a design for #47389 is found? Is it too late already as we have set_span
functionality? @michaelwoerister what do you think?
But rust-analyzer might one day expand the scope of incremental compilation to the parsing stage, right?
That is already true for rust-analyzer, macro expansion stage runs incrementally. We do’t yet support line/column macro, but I think what we would do is that we’ll just expand them to a dummy value like 42. This probably won’t break macro (It could be this line indeed), but will allow us to avoid re-expanding macro after every keystroke
What we could do is store for each macro invocation whether it has invoked any of the inspection APIs or not and then re-run the macro only if any of the span information it has requested has had any changes. Basically only adding the infos the requested spans to the hash (or only adding span info to the hash if there's been any request of span info). However, as this has to update state, it would make span inspection slower than it could be, especially if something inspects spans routinely. So not sure about the performance impacts of this.
@alexcrichton @matklad Any update on what blockers are for this issue? It doesn't seem like there should be anything holding this back.
@jhpratt my question about 1-based vs 0-based is still not answered: https://github.com/rust-lang/rust/issues/54725#issuecomment-497246753
Hm, that's a fairly simple one. My intuition would certainly lean towards 1-based.
Is this still blocked on questions with regard to the inconsistency between two base indices?
I just thought I'd chime in here, cause I just ran into an issue where I would like to be able to map "place in code" to something else. Basically, I don't care about files or columns or indexes or anything, I just need something that's Eq + Hash
, and that I can get 2 from (start and end) for a Span
(also that it follows that any given token t has the same end as the start of the next token). This could just be a usize that's local to the current macro invocation and doesn't have to mean anything to the broader world, but it gives the ability to sort and map on "code points".
I'd love to see this stabilized too. In particular, assert2
uses Span::source_text()
to show the expression that caused an assertion failure:
On stable it falls back to regular stringification, but that certainly looks less pretty.
Note that Span::parent
allows you to write 'completely unhygenic' macros, which can see identifiers in a way that would otherwise be impossible:
// src/lib.rs
#![feature(proc_macro_span)]
use proc_macro::TokenStream;
#[proc_macro]
pub fn no_hygiene(tokens: TokenStream) -> TokenStream {
tokens.into_iter().map(|mut tree| {
let mut span = tree.span();
while let Some(parent) = span.parent() {
span = parent;
}
tree.set_span(tree.span().resolved_at(span));
tree
}).collect()
}
use no_hygiene::no_hygiene;
macro_rules! weird {
() => {
println!("{}", no_hygiene!(my_ident));
}
}
fn main() {
let my_ident = "Hello";
weird!();
}
Here, weird!()
is able to access my_ident
via no_hygiene!
. This is already possible with just Span::resolved_at
in some circumstances - if a single token has a Span
with the desired hygiene, a proc-macro can construct arbitrary tokens with the desired hygiene. However, Span::parent
allows hygiene to be completely bypassed - a proc-macro can see into a particular scope that it would otherwise be completely isolated from.
Using Span::parent
may be useful for the purposes of emitting error messages, but I'm worried about macros generating working code that intentionally uses this. This seems like an easy way to end up with code that has confusing non-local effects.
We might want to gate 'read-only' APIs (e.g. Span::source_text
, Span::start
) under a separate feature, since they could be stabilized without increasing the power of proc macros w.r.t hygiene.
We might want to gate 'read-only' APIs (e.g.
Span::source_text
,Span::start
) under a separate feature, since they could be stabilized without increasing the power of proc macros w.r.t hygiene.
Came here to advocate for just this. I have a use case where I require access to the start/end line/column information of spans. And it would indeed be nice if they could be decoupled from the wider feature for an easier path towards stabilization.
So I'm looking into what's needed to get the following stabilized in libproc_macro
:
Span::start
Span::end
LineColumn
There are two outlined concerns that I've been able to digest so far. If someone has more, I'd love to hear them.
The corresponding functions and LineColumn
structure in proc_macro2 are already marked as stable with the same 0-column specification. But as pointed out in alexcrichton/proc-macro2#237, while they are available they only supply dummy values on stable right now. But if desired, I'd be happy to put in the work to modify the reported column information to be 1-based instead. I personally don't have a strong opinion either way. The manner in which I've used spans in genco (and why I now have a desire for them to be stabilized :)) only cares about relative spans.
Given that proc-macros today can execute arbitrary code, and this seems like something we just have to live with. I don't know how beneficial limiting the existing proc_macro API surface with the goal of making macro execution as reproducible as possible is at this point. I personally wouldn't want to block on this. But for my specific use-case, changing the spans to be reported in a manner which is local to the macros themselves would also give me what I need. So if desired, I'd be for implementing this change.
To clarify and IIUC, that means this assuming 0-based columns:
/// A comment
macro! { hello // start {column: 8, line: 2}
world // start {column: 4, line: 3}
};
Would instead be reported in the proc-macro as this:
/// A comment
macro! { hello // start {column: 1, line: 1}
world // start {column: 4, line: 2}
};
reported spans affecting the output of proc-macros I don't know how beneficial limiting the existing proc_macro API surface with the goal of making macro execution as reproducible as possible is at this point.
I don't think there are any concerns about reproducibility, there are concerns about incremental compilation which is something completely different. I think its important that we get opinion of some incremental compilation expert from t-compiler about this (cc @wesleywiser? not really sure whom to CC).
Are we ok with exposing absolute position information to macros? AFAIUI, that means that either:
I don't think there are any concerns about reproducibility, there are concerns about incremental compilation which is something completely different.
So when I phrase it as the output of a macro being reproducible, that is what I'm referring to. Given a collection of inputs it would produce an identical output which would allow it to be cached during incremental compilation.
But to my wider point, a proc-macro can generate random numbers, access the filesystem, or connect to a database (the example I linked to). I'd be very curious to hear about plans for incremental compilation, but given these facts I have a hard time envisioning an approach that allows us to make decisions about re-expanding beyond some form of well understood best-effort heuristic.
But to my wider point, a proc-macro can generate random numbers, access the filesystem, or connect to a database (the example I linked to).
That's possible for now, but it's not a documented feature that proc macros are unsandboxed and it's likely that wasm based macros won't get that capability.
To the general question of incremental compilation see also https://github.com/rust-lang/rust/issues/54725#issuecomment-502932263
@matklad While I do know a bit about the incremental system, I wouldn't consider myself an expert. I believe @pnkfelix knows a fair bit more than I.
I'm specifically interested in Span::join
that would allow for painless proc-macro error messages on stable. There's the hack that makes it already possible, but in order to exploit it, one has to carry around two spans: the span to the first token and the span to the last token.
If join
were available, syn
could do the joining in its ast.span()
methods and the two-spans trick would become easier to pull off since the joined span had already covered the whole range.
I'm specifically interested in Span::join that would allow for painless proc-macro error messages on stable
Before Span::join
is stabilized, we should decide how it should handle hygiene. If I write first.join(second)
should the returned Span
use Span::resolved_at(first)
or Span::resolved_at(second)
? Or should we change the return type to Result<Span, JoinErr>
, and have enum JoinError { DifferentFile, DifferentHygiene }
?
@Aaron1011
Note that
Span::parent
allows you to write 'completely unhygenic' macros, which can see identifiers in a way that would otherwise be impossible:
Not to advocate in either direction, but this is already possible today, without Span::parent()
:
// main.rs
mod submod {
macro_rules! make_local_weird {
($local:item) => (
#[macro_export]
macro_rules! weird {
() => (
println!("{}", export! { $local private })
)
}
pub use weird;
)
}
make_local_weird!(fn f() {});
}
pub fn main() {
let private = "hi";
submod::weird!();
}
// lib.rs
use proc_macro::TokenStream;
use syn::spanned::Spanned;
use syn::{parse_macro_input, Result};
use syn::parse::{Parse, ParseStream};
use quote::quote;
struct Export {
item: syn::Item,
ident: syn::Ident,
}
impl Parse for Export {
fn parse(input: ParseStream) -> Result<Self> {
Ok(Export {
item: input.parse()?,
ident: input.parse()?,
})
}
}
#[proc_macro]
pub fn export(input: TokenStream) -> TokenStream {
let input = parse_macro_input!(input as Export);
let mut ident = input.ident;
let span = ident.span().resolved_at(input.item.span());
ident.set_span(span.into());
quote!(#ident).into()
}
Are we ok with exposing absolute position information to macros? AFAIUI, that means that either:
we'll have to re-expand all proc-macros even for trivial changes like addition of blank line.
we'll have to record "has this macro looked at line/column?" bit of info, and still re-expand those specific macros for trivial changes.
A third option would be to provide dummy information to proc macros from rust-analyzer. AFAIK, most proc macros use the data they obtain from these APIs only at runtime, and do not really make any decisions based on it at expansion time (and even then, they should work fine if line/column is always 1, unless they're being actively annoying).
A third option would be to provide dummy information to proc macros from rust-analyzer. AFAIK, most proc macros use the data they obtain from these APIs only at runtime, and do not really make any decisions based on it at expansion time (and even then, they should work fine if line/column is always 1, unless they're being actively annoying).
The macros requiring this information appear to do so for whitespace information (mine does, and so does done of the linked issues above as well). I don't see how this would work without a difference between spans. So being actively annoying might just be a valid use case.
Just to give another example of proc-macros requiring white-space information to function properly: inline-python
Also another use-case I could think of would be generating documentation based on Rust code in the macro input. Without whitespace information, this could look quite bad:
std :: mem :: ManuallyDrop < T > :: drop ( & t ) ;
// vs
std::mem::ManuallyDrop<T>::drop(&t);
call ( & x [ 0 ] ) ;
// vs
call(&x[0]);
Though the second use-case could probably handle invalid line/column data fairly well, if it is only given when executed by an IDE. In the worst case it would possibly just generate unreadable documentation that isn't used by the IDE.
Since these examples only need relative position of the tokens, the fourth option is to provide the information, but count lines from the start of the macro call only (so the first token in the token stream always has line = 1). That way if whitespace changes inside the macro call it needs to be re-expanded, but if anything changes before it it does not.
A third option would be to provide dummy information to proc macros from rust-analyzer. AFAIK, most proc macros use the data they obtain from these APIs only at runtime, and do not really make any decisions based on it at expansion time
ct_python!{}
executes Python code at compile time. Changing the span information could change the meaning of the Python code. Though one could definitely argue that this whole crate is a horrible idea in combination with RLS/rust-analyzer (or just in general ^^'). (With RLS, it executes the Python code while you are typing it. :grimacing:)
Maybe it'd be useful if a proc macro can tell it is being executed by a language server instead of the compiler? Not sure what it could do better in that case though, because the language server still needs the expansion to do the rest of its job.
We had to rely on Span
's Debug
impl because the line/column access methods are still unstable: https://github.com/knurling-rs/defmt/issues/344
So tentatively it seems like the proposed blockers for stabilizing a subset of proc_macro_span
as I detailed in https://github.com/rust-lang/rust/issues/54725#issuecomment-647140141 (i.e. min_proc_macro_span
) have either been discussed or no one has raised much interest in them (e.g. if spans should be reported macro-relative or be 1 or 0-based).
The biggest discussion seems to be about incremental compilation, so I'd like to take as broad a view on the topic as I can here. I hope I'm not missing something important, but if I do, please do tell!
Incremental compilation of macros as they look today based on assumptions of reproducibility (https://github.com/rust-lang/rust/issues/54725#issuecomment-502932263, https://github.com/rust-lang/rust/issues/54725#issuecomment-647142018) do not seem like they're going to be affected either positively or negatively by stabilization of proc-macro APIs which allows for different outputs depending on where the macro is located in source. As mentioned a few times above (https://github.com/rust-lang/rust/issues/54725#issuecomment-647146117, https://github.com/rust-lang/rust/issues/54725#issuecomment-736371755) the stable parts of the proc-macro system is already "contaminated" with other sources of data that can cause different outputs. This is even the case for the currently stable API of libproc_macro since the output of fmt::Debug
prints the absolute byte span of a span in the file.
So it seems like the cat is already out of the bag here. And this might very well be addressed in future proc-macro APIs. If such an API is designed in the future it could also come with a different version of libproc_macro
. But "leaky macros" do regardless seem to have legitimate uses, so it seems unlikely that they would ever be fully deprecated.
So I would like to push towards moving forward with stabilization with a minimal version of proc_macro_span
which features read-only APIs. In particular parts which have no effect on hygiene (as mentioned here https://github.com/rust-lang/rust/issues/54725#issuecomment-640169091).
Having absolute line&column positions will also break autoformatting. While sensible formatting of macros is mostly impossible, since they don't need to follow the Rust syntax, at the very least we can currently assume that whitespace has no semantic meaning. This means that we can add whitespace around operators, format inner blocks if we can prove that they are valid Rust code, or make macro calls properly indented. If a macro knows its absolute position, then simply adding or removing indentation is enough to arbitrarily change its behaviour.
Granted, it's a very minor point compared to a loss of hygiene or incremental compilation, but it's still an issue.
Any kind of tooling which assumes that whitespace has no meaning will be screwed by this feature. Note that the promise of no meaningful whitespace is currently contained in the reference manual. Quote:
A Rust program has identical meaning if each whitespace element is replaced with any other legal whitespace element, such as a single space character.
And according to the reference on macros, the macros receive only a delimited token tree, so the whitespace is also irrelevant within macros.
So stabilizing this API will be strictly a backwards incompatible change w.r.t. current Rust guarantees. While, strictly speaking, the usual guarantees are with respect to code compiling and having the same behaviour, I believe that breaking compatibility with tools isn't a thing which should be taken lightly.
I'll like to point out that procmacros can already parse the source code to find where they might have been invoked, and can do any number of bad things that will cause changes to users code to break under various circumstances. Making it more convenient for them to break tooling does not itself break said tooling. Any user of a proc macro is already relying on the author of that macro to make it do reasonable things.
Getting reasonable debug information in cases where macros fail is a huge win, and I'd hate got that to be lost due to some concern that macro writers will turn evil if we give them more power.
You're stating it as if it's a hypothetical problem, while there are already crates which do exactly the position-dependent processing that I'm talking about. Do remember that this can cause issues for downstream users which may not even know about such features, much less use them directly. I shouldn't know which proc macros are called by my transitive dependencies, but if a macro that I use calls one then my IDE performance will suddenly plummet.
If it's only about debug information, then surely there are better ways to provide it than directly exposing source code layout. Position-independent relative offsets solve many of the problems: if I'm forbidden to offset outside of the containing macro call, then I probably can't violate hygiene, and one doesn't need to reparse the whole crate if some formatting changes. Another option is to make line&column information into opaque objects rather than just integers. You can print them, you can store them, maybe even do some simple calculation like comparison and calculating relative offset, but you can't just make one out of thin air or extract forbidden absolute positions.
Come to think of it, why would it ever be a good idea to allow proc macros to know absolute offset and to create arbitrary spans? Surely a macro has no legal way to know the contents of the surrounding code, so why would it be granted access to it in the first place? The only thing that a user of the macro sees is the code that they directly typed inside of it, and spans in that exact code should be the only thing that a macro should work with.
A different, always available option is to write a separate source-transforming tools, rather than make those capabilities always provided by the compiler. You can use a format which is related to, but different from the Rust sources. You can use the standard syntactic libraries for writing that tool, and since you're doing all the lexing and parsing, you're free to keep any extra information that you want. For a practical example you can look at the camlp5 preprocessor for the OCaml language. But at the same time no one will accidentally depend on your special syntactic features, and the other tools will know that your format is something that they can't process without special support. I don't think pulling the kitchen sink, the fridge and the whole swimming pool into the language is necessarily a great idea.
I shouldn't know which proc macros are called by my transitive dependencies, but if a macro that I use calls one then my IDE performance will suddenly plummet.
IDE performance would only be minimally affected if these Span
APIs are implemented properly. You can already make IDE performance arbitrarily bad by just using a slow proc macro, so I don't really think this is a reason to not provide this API at all.
if I'm forbidden to offset outside of the containing macro call, then I probably can't violate hygiene
Hygiene is independent from positional information. We could remove the problematic Span::parent
and stabilize the LineColumn
-related APIs.
Hygiene is independent from positional information. We could remove the problematic
Span::parent
and stabilize theLineColumn
-related APIs.
This sounds great and would make it possible to reconstruct enough information about whitespace to implement a lot of whitespace-sensitive EDSLs. Any blockers?
Hygiene is independent from positional information. We could remove the problematic Span::parent and stabilize the LineColumn-related APIs.
This sounds good to me. Looks like the reason https://github.com/rust-lang/rust/pull/84331/files was rejected was introducing a new feature flag. I believe if instead those api's had just been made stable, it might have been fine.
Since the information is there, I'm wondering if it's possible to expose the byte offsets of a span within a file.
For example, given this (very simple and dumb) proc macro (using syn
and quote
for convenience)
use quote::quote;
use quote::ToTokens;
use syn::parse_macro_input;
use syn::spanned::Spanned;
use syn::ItemFn;
use syn::Stmt;
#[proc_macro_attribute]
pub fn instrument(
_attrs: proc_macro::TokenStream,
input: proc_macro::TokenStream,
) -> proc_macro::TokenStream {
let mut input = parse_macro_input!(input as ItemFn);
let span = input.span();
let span = format!("{:?}", input.span());
input
.block
.stmts
.push(syn::parse2::<Stmt>(quote! { println!("span is {}", stringify!(#span)); }).unwrap());
input.to_token_stream().into()
}
on the following input:
#[proc_macro::instrument]
fn main() {}
the output of the resulting binary is:
span is "#0 bytes(27..39)"
However, those byte offsets aren't accessible in the API. Is it possible to expose those byte offsets via getters? Maybe some fn byte_offsets(&self) -> Range<usize>
?
Hi all --
We read over this in @rust-lang/lang backlog bonanza today and had a few questions. There seems to be a broad set of APIs covered under this feature, so I think it would be useful to identify what subset we need to enable particular use cases.
Reviewing the thread, the major use case seemed to be reporting diagnostics (e.g., this comment about assert2). If that's the case, it seems like exposing
source_text
would be sufficient to cover these cases. All of these have the property that one must actually edit the contents of the macro to invalidate them -- though source-text is distinct from 'pure tokens'.
Are there use cases that would be not be handled by the APIs above? (And which ones require relative information?)
If not, maybe we should break out this subset and discuss there?
Hi all --
We read over this in @rust-lang/lang backlog bonanza today and had a few questions. There seems to be a broad set of APIs covered under this feature, so I think it would be useful to identify what subset we need to enable particular use cases.
Reviewing the thread, the major use case seemed to be reporting diagnostics (e.g., this comment about assert2). If that's the case, it seems like exposing
source_text
- nicer diagnostic APIs (separate tracking issue, I think)
- and maybe line/column information relative to the start of the macro
would be sufficient to cover these cases. All of these have the property that one must actually edit the contents of the macro to invalidate them -- though source-text is distinct from 'pure tokens'.
Are there use cases that would be not be handled by the APIs above? (And which ones require relative information?)
If not, maybe we should break out this subset and discuss there?
I had a use case a while back that really would have benefitted from knowing the path to the source file where the macro invocation occurred. I was trying to write something like a custom module resolver — basically, I wanted to be able to "import" a source file written in some other language (like a scripting or templating language), and the macro would facilitate transpiling the import to valid Rust code. But being unable to specify relative import paths made it prohibitively inconvenient/unergonomic for my liking.
However, that use case would also require invalidating the macro whenever the contents of the imported file changed, so it sounds like it would be problematic for plenty of other reasons besides.
Hi all -- We read over this in @rust-lang/lang backlog bonanza today and had a few questions. There seems to be a broad set of APIs covered under this feature, so I think it would be useful to identify what subset we need to enable particular use cases. Reviewing the thread, the major use case seemed to be reporting diagnostics (e.g., this comment about assert2). If that's the case, it seems like exposing
source_text
- nicer diagnostic APIs (separate tracking issue, I think)
- and maybe line/column information relative to the start of the macro
would be sufficient to cover these cases. All of these have the property that one must actually edit the contents of the macro to invalidate them -- though source-text is distinct from 'pure tokens'. Are there use cases that would be not be handled by the APIs above? (And which ones require relative information?) If not, maybe we should break out this subset and discuss there?
I had a use case a while back that really would have benefitted from knowing the path to the source file where the macro invocation occurred. I was trying to write something like a custom module resolver — basically, I wanted to be able to "import" a source file written in some other language (like a scripting or templating language), and the macro would facilitate transpiling the import to valid Rust code. But being unable to specify relative import paths made it prohibitively inconvenient/unergonomic for my liking.
Seconded. I was going to write a similar reply myself -- I had a use case that was almost exactly the same, where I wanted to allow something resembling including a module that's written in another language. This isn't the only use case for a feature like this though. Anything that includes things from the file system can benefit, such as include_dir.
See https://github.com/rust-lang/rfcs/pull/3200 for some more APIs for including files (with some minor discussion about "Span
-relative" includes)
stable version required
Another use-case noted above that is quite different from diagnostics is getting a unique identifier to use in link_section
/export_name
s.
I've opened an internals thread for discussing that.
@Aaron1011 closed by mistake? Can't seem to find an alternative tracking issue, a PR stabilizing it or a PR removing the feature.
This issue is intended to track a number of unstable APIs which are used to inspect the contents of a
Span
for information like the file name, byte position, manufacturing new spans, combining them, etc.This issue tracks the
proc_macro_span
unstable feature.Public API
Already stabilized:
To be stabilized, probably in their current form:
To be stabilized after some (re)design or discussion:
Things that require more discussion: