Open RalfJung opened 5 years ago
The failing command seems to be
/home/r/.cargo/bin/cargo-miri rustc --edition=2018 --crate-name core_miri_test core_miri_test/../libcore/lib.rs --error-format=json --json=diagnostic-rendered-ansi --emit=dep-info,link -C debuginfo=2 cargo-miri-marker-begin cargo-miri-marker-end --test -C metadata=c5be0bd0faaf3ad2 -C extra-filename=-c5be0bd0faaf3ad2 --out-dir /home/r/src/rust/miri-test-libstd/target/debug/deps -C incremental=/home/r/src/rust/miri-test-libstd/target/debug/incremental -L dependency=/home/r/src/rust/miri-test-libstd/target/debug/deps --extern rand=/home/r/src/rust/miri-test-libstd/target/debug/deps/librand-bd1cec56ecfccb49.rlib
So this is where it builds libcore itself, not the test crate.
I can get the same errors without Miri being involved by running rustc +nightly lib.rs --test --edition 2018
in src/libcore
.
Cc @alexcrichton and @eddyb who helped me to set this up at the all-hands early this year
cc @Centril maybe there is something weird going on with the feature-gating here.
The only relevant information I can contribute here is that feature gating for this is done pre-expansion today.
libcore has a #![cfg(not(test))]
, so the crate really should be empty. I guess "pre-expansion" means stuff happens before the cfg
is applied?
But then, why does it see the uses of this feature, but does not see the feature flag?
@Centril here's a minimal example:
#![cfg(foobar)]
#![feature(decl_macro)]
pub macro Default($item:item) { }
This gets rejected, which IMO is a bug.
cc @petrochenkov
Here's another variant:
#![cfg_attr(nightly, feature(decl_macro))]
#[cfg(nightly)]
pub macro Default($item:item) { }
This shows that only conditionally using this feature is broken right now.
Regression likely introduced by https://github.com/rust-lang/rust/pull/65742.
Nominating for compiler team discussion. IMO this regression should be fixed or the PR reverted.
That's not a bug.
Pre-expansion gating happens by collecting a set of Span
s before parsing. cfg
-stripping happens after parsing and does not remove those spans, which then later all get feature gate errors if the feature gate is not enabled (and if the feature gate was stripped away during cfg
-stripping then it isn't). The point of pre-expansion gating is that we should be able to remove syntax from the parser and the AST.
If you want to use unstable syntax behind cfg(flag)
you'll need to put the unstable syntax behind a module and #[cfg(flag)] mod the_module_with_unstable_syntax;
.
The original example @RalfJung had involved #![cfg(not(test))]
at the crate level, and somehow that didn't prevent non-inline modules from being loaded. I wonder if that's a deficiency of cfg
-stripping, or maybe it's just inside-out?
So this would always parse bar.rs
and only afterwards remove all of foo
:
#[cfg(whatever)]
mod foo {
mod bar;
}
@RalfJung This would mean you need #[cfg(not(test))]
on every mod xyz;
in libcore for which you get an error in the corresponding xyz.rs
file.
@eddyb I'm being literal here; you need #[cfg(...)]
directly on the mod xyz;
.
@Centril I think it is a bug when cfg
does not behave as expected. I understand it's not a bug in the code but a conceptual bug and you consider this intended behavior -- but your justification for why this is intended is the one of an implementer, ignoring the user perspective. Is there a reasonable way (not referring to compiler implementation details) to describe the end-to-end behavior that we are seeing now?
Taking the perspective of an end-user, I think this is quite clearly a bug: cfg
removes the feature flag but fails to remove the use of the unstable feature. Behavior of cfg
should be consistent, but it is not. Either you should honor cfg
for both feature flag detection and stability checking, or you should ignore cfg
for both feature flag detection and stability detection; the current behavior is applying different rules and that makes no sense.
If you want to use unstable syntax behind cfg(flag) you'll need to put the unstable syntax behind a module and #[cfg(flag)] mod the_module_with_unstable_syntax;.
So there are some obscure rules for cases where the cfg
flag is applied early enough for this to work? This is getting even weirder.^^
One subtle point here is that you kind of have to imagine mod xyz;
expanding to mod xyz { include!("xyz.rs"); }
and if it's cfg-stripped the expansion will never happen, resulting in xyz.rs
never being read or parsed.
I guess that "expansion" could be on the fly, with cfg-stripping happening very early, but regardless, cfg-stripping appears to apply between parsing the semicolon form and reading the associated file.
@RalfJung Sometimes there are architectural reasons for why things are the way they are that may seem weird to an end user. I'm sure you can appreciate that wrt. other parts of the compiler. :)
Is there a reasonable way (not referring to compiler implementation details) to describe the end-to-end behavior that we are seeing now?
Here's the best I can do for now: When #[cfg(stuff)]
is written on a non-inline mod foo;
and stuff
is determined not to be in the configuration, then the stuff in mod foo;
is not parsed. In all other circumstances, mod foo;
will parse foo
within the parser. Feature gating of syntax (for a certain set of syntactic feature gates) happens in the parser. Cfg-stripping happens, except in the case of #[cfg(stuff)] mod foo;
after parsing.
So there are some obscure rules for cases where the
cfg
flag is applied early enough for this to work? This is getting even weirder.^^
Yes, it's weird, but the conceptual bug is not in https://github.com/rust-lang/rust/pull/65742 which merely extended the current pre-expansion infra to a new set of feature gates (as it should!) and you were unlucky to hit a specific feature gate.
I guess that "expansion" could be on the fly, with cfg-stripping happening very early, but regardless, cfg-stripping appears to apply between parsing the semicolon form and reading the associated file.
@eddyb Yes, that's right; the weird behavior comes from https://github.com/rust-lang/rust/blob/0b7e28a1610924a27471ffdb59a2885709b3b415/src/libsyntax/parse/parser/module.rs#L40-L90.
The proper fix is to address https://github.com/rust-lang/rust/issues/64197 but that will require more work.
I will raise my voice here again (similarly to https://github.com/rust-lang/rust/issues/65846, which I'll merge into this shortly, it has some good discussion IMO) in my belief that we're being too aggressive here; we're breaking people in practice and without any warning. We should work to phase this in, not just break folks without warning. In practice the breakage this introduces does not seem entirely worth it -- indeed, I'm not sure that it's correct to do so in general.
I would expect us to gate against e.g. macros matching against such input, but I continue to be skeptical about failing to compile something like if --cfg foo
is not passed. I get that box syntax is unstable, but I feel like #[cfg(foo)]
is specifically intended for this sort of "this code does not exist" type gating. We might want to be careful and decide on a case-by-case basis, e.g., maybe box syntax we're super uncertain will ever be stabilized, so we gate against it. But if we think type ascription is generally plausible to stabilize, then maybe we don't syntax-level gate it.
#[cfg(foo)]
fn foo() {
let f: Box<u32> = box 3;
}
As it stands, if I interpret this change correctly, there's essentially no good way for a crate to have nightly-only usage of syntax-dependent features, right? Or they need to put the cfg's strictly on module-level, which is also painful and unhelpful -- it seems like this is what @RalfJung here is hitting.
We'll have a more accurate picture of who/what we've broken once this change hits beta in a week and we get a crater run going, I think, as on nightly lots of crates are plausibly just enabling the feature gate(s) so we're not seeing the true scope of regression I suspect.
@Centril Ah, I see, I was initially wondering if what https://github.com/rust-lang/rust/issues/64197 proposes already happened, but in this case cfg-stripping during parsing also explains the "inside-out" expansion-like behavior.
I will raise my voice here again (similarly to #65846, which I'll merge into this shortly, it has some good discussion IMO) in my belief that we're being too aggressive here; we're breaking people in practice and without any warning. We should work to phase this in, not just break folks without warning.
As I've noted before, we did a crater run, with sufficiently few regressions per gate to do the pre-expansion gating. It may be that the crater runs on beta will turn up more regressions, it may be that people will catch these problems in their nightly CI and fix them early. At any rate, I don't think we should revert anything until we have specific numbers from runs and when we do they may be partial reverts for specific feature gates.
Also, soft-gating pre-expansion but hard-gating post-expansion won't nice to do; it will most likely require some sort of hacks of the form "are we in expansion?" when adding spans. We then need to avoid post-expansion gating those spans in the AST visitor.
In practice the breakage this introduces does not seem entirely worth it -- indeed, I'm not sure that it's correct to do so in general.
The purpose of feature gating is to avoid leaking experimental things into the stable language. This doesn't just apply to semantics, it also applies to syntax. Like any unstable aspect of the language, we may wish to remove it, or change how the syntax looks. The idea that we would be forced to keep around garbage in the grammar due to a lack of pre-expansion gating is not acceptable to me. Therefore, moving forward, all new syntax will be pre-expansion gated unless it is for some reason exceptionally difficult to do so and the language team has approved of an exception in the specific case.
But if we think type ascription is generally plausible to stabilize, then maybe we don't syntax-level gate it.
I'm not going to speculate how likely it is that we stabilize X or Y feature. Once upon a time we presumably thought that box syntax was likely to be stabilized but then we changed our minds. Also, while I would like to see type ascription on stable, others feel differently. Instead, we ungate features when they are ready.
As it stands, if I interpret this change correctly, there's essentially no good way for a crate to have nightly-only usage of syntax-dependent features, right? Or they need to put the cfg's strictly on module-level, which is also painful and unhelpful -- it seems like this is what @RalfJung here is hitting.
There is a good way: #[cfg($flag)] mod stuff_with_unstable_syntax;
. Once https://github.com/rust-lang/rust/issues/64197 happens it would become easier, but ergonomic difficulties today is not a good reason to leak syntax into stable.
We'll have a more accurate picture of who/what we've broken once this change hits beta in a week and we get a crater run going, I think, as on nightly lots of crates are plausibly just enabling the feature gate(s) so we're not seeing the true scope of regression I suspect.
I think judgement should be withheld until that happens.
Sometimes there are architectural reasons for why things are the way they are that may seem weird to an end user. I'm sure you can appreciate that wrt. other parts of the compiler. :)
I can appreciate that. But that also always requires some consideration of whether the effect is really acceptable, in particular when considering regressions. This change will make some code that compiles on stable today no longer work on stable.
What is happening here is that an important property of cfg
is destroyed: so far, one could reliably use cfg
to hide things from the compiler. This is used by some crates to automatically enable features on nightly (and hide them from stable compilers), and by others to automatically enable the use of new features on recent compiler versions (and hide them from older compilers). That is an extremely useful property to have, and it is a property we have had since 1.0. You are removing this feature without offering a good alternative. (Having to put the "hidden" code into a separate file is not a good solution.)
My personal opinion is we should keep this property even if it makes the compiler a bit more complicated -- but more fundamentally I feel this is something that should be discussed with stakeholders first before we break the world, e.g. through an RFC.
I get that box syntax is unstable, but I feel like #[cfg(foo)] is specifically intended for this sort of "this code does not exist" type gating.
Exactly! That has been another point on the list of things that "Rust got right", but it seems now we decided we'd rather get it wrong. :/
As I've noted before, we did a crater run, with sufficiently few regressions per gate to do the pre-expansion gating. It may be that the crater runs on beta will turn up more regressions, it may be that people will catch these problems in their nightly CI and fix them early.
This is unsurprising. Crater would not have found crates that used either of the two idioms I described above: a crater run with a nightly compiler will enable the unstable features and their flags, so no trouble, and it will also be the latest compiler version so "hiding code from old compilers" does not apply either.
To properly test the impact at this, you need to at least do a crater run with a rustc that does not consider itself a nightly compiler; then you'll test the case of hiding things from stable compilers. The part about hiding things from older versions is much harder to test.
The idea that we would be forced to keep around garbage in the grammar due to a lack of pre-expansion gating is not acceptable to me. Therefore, moving forward, all new syntax will be pre-expansion gated unless it is for some reason exceptionally difficult to do so and the language team has approved of an exception in the specific case.
The idea of making it impossible to use cfg
to hide code from the compiler is also unacceptable to some people.
I understand your opinion, but you finding things unacceptable is not how we decide about trade-offs in Rust. Can you point at where consensus was formed that this is indeed the right way forward for the language, given its cost?
There is a good way: #[cfg($flag)] mod stuff_with_unstable_syntax;. Once #64197 happens it would become easier, but ergonomic difficulties today is not a good reason to leak syntax into stable.
That's not a good way, it forces users to put every use of the experimental feature that they want to hide into a separate file. It is a horrible hack, not a solution.
(Didn't read the thread.)
See https://github.com/rust-lang/rust/pull/46480 for a motivating example for gating unstable syntaxes early.
Long story short: we wanted to remove entirely unstable impl Trait for .. {}
but couldn't, it suddenly became a part of the stable language due to cfg(FALSE)
.
So all new features changing the language grammar are now gated in the parser. Some older features were implemented without following this policy, so @Centril recently changed them to follow it and here is the fallout through which we need to work on case by case basis depending on the amount and importance of the broken code.
But that also always requires some consideration of whether the effect is really acceptable, in particular when considering regressions. This change will make some code that compiles on stable today no longer work on stable.
Consideration was, and is being given.
My personal opinion is we should keep this property even if it makes the compiler a bit more complicated
Why is syntax special here? One could also argue that cfg
-gating alone is too unergonomic and that semantic aspects should also not be gated.
You are removing this feature without offering a good alternative.
You are suggesting that somehow pre-expansion gating started with https://github.com/rust-lang/rust/pull/65742, but in fact that is not the case. You will note that it was @petrochenkov who (rightly) requested this in the first cases. Another example is in https://github.com/rust-lang/rust/pull/64588/files?file-filters%5B%5D=.rs#diff-54f2c35b9f39ddec06693253b803e65eR824.
(Having to put the "hidden" code into a separate file is not a good solution.)
That's not a good way, it forces users to put every use of the experimental feature that they want to hide into a separate file. It is a horrible hack, not a solution.
I disagree. It incentivizes segregating stable and unstable things such that it becomes clearer which is which. It may become more ergonomic in the future, but don't think that should be the first order priority.
What is happening here is that an important property of
cfg
is destroyed: so far, one could reliably usecfg
to hide things from the compiler.
I think you can still do it "reliably" (because you'll get errors otherwise), but I think you dispute that a separate file is reliable.
Generic workaround:
macro_rules! delay_parsing { ($($tt:tt)*) => ($($tt)*) }
#[cfg(nightly)]
delay_parsing! {
macro foo() {}
}
If macro
syntax is changed (likely) or removed (unlikely) then only the nighthly
mode will be broken.
I'm also a bit confused -- if the generic workaround always works (implied by the generic argument), then it seems like we should be able to make the cfg gating work on any item(?)-level boundary (perhaps "clearly delimited by stable syntax" is more precise) reliably, which would essentially eliminate my concerns here.
I feel like this is a clear example of us not having built consensus yet around how to integrate these changes, and prior work (i.e., the PRs that @Centril has listed) having landed in this area without being complained about is not an argument. The point here is that we are gating things in a way that is perhaps non-desirable. We should discuss this, in public, and build consensus. Even if strictly speaking we can land this change, it does not mean we should. This is exactly what RFCs are for: building consensus.
Long story short: we wanted to remove entirely unstable impl Trait for .. {} but couldn't, it suddenly became a part of the stable language due to cfg(FALSE).
Ouch, that's not great. Looks like there is a bunch of history involved in this -- is all that coherently written up somewhere so that one has a chance of understanding the trade-off here?
Why is syntax special here? One could also argue that cfg-gating alone is too unergonomic and that semantic aspects should also not be gated.
Y'all are making syntax special (not just since recently it seems) by making it the only thing that is checked even in cfg
-disabled code. I am arguing against making it special. So I don't get why you want to claim the reverse now?
Given that I can have entirely my own fantasy syntax in a macro (and @petrochenkov bases a work-around for this issue on that), it seems odd that the parser is seemingly able to ignore these problems in macro bodies but not in cfg
-disabled code.
You are suggesting that somehow pre-expansion gating started with #65742, but in fact that is not the case.
Interesting. Does this mean that it is impossible to conditionally use those features behind cfg
?
For &raw
, that will be seriously annoying in memoffset
.
I disagree. It incentivizes segregating stable and unstable things such that it becomes clearer which is which. It may become more ergonomic in the future, but don't think that should be the first order priority.
I cannot see anything clear about having to split every file in my crate into two to reflect this separation -- or having to reorganize my crate (and its module and visibility boundaries!) just to put stuff together that depends on the same feature flag.
This incentivizes organizing your code by Rust features as opposed to its logical structure, which is bad.
I think you can still do it "reliably" (because you'll get errors otherwise)
:rofl: I wouldn't call compile errors as being able to reliably achieve anything.^^ If that counted, Rust would already reliably implement ATCs...
You'll also only get errors on compilers that actually consider the supposedly "hidden" code as unstable. So e.g. if I wait until let
-expressions are stable, and then start using them in my crate but only if a new enough rustc is detected -- then only exactly those versions of rustc that have the feature in experimental will fail to compile (older versions are fine), which I might easily miss if I only test the MSRV and latest stable.
I submitted a PR that works around the problem in libcore: https://github.com/rust-lang/rust/pull/65981
My opinion here:
The change to enforce syntactic feature gating early is addressing a real problem: in the current setup of the compiler (or the setup before https://github.com/rust-lang/rust/pull/65742, at least), we could easily wind up with stable code that was dependent on older syntactic forms parsing. That is a problem.
The proposed fix for this problem -- "pre-expansion gating" -- does indeed avoid the problem, and that is good. But it also does so in a fairly blunt way that is kind of hard to explain to users, as @RalfJung correctly pointed out.
As far as crater goes, it is good that we did a crater run to estimate the impact, but I think that this comment misunderstands how crater ought to be used:
As I've noted before, we did a crater run, with sufficiently few regressions per gate to do the pre-expansion gating
In particular, it suggests that if we do a crater run and see few regressions, that means we are done -- but of course we know that crater has many shortcomings. I see a crater run as a good way to avoid problems, but doing a crater run doesn't mean we don't have to consider problems that we failed to foresee. (@RalfJung also suggested here that the crater run would not have found this breakage in the first place; I'm not sure if that is correct, but if so that's obviously not great.)
As to our immediate action, I'm wary. As a general rule, I tend to favor backing out PRs that are causing unanticipated problems. This doesn't mean we won't add it back again, it just buys us time to have the discussion in a calmer fashion.
I think it's pretty clear that this particular interaction was unforeseen -- we have a vested interest in miri's test suite remaining functional, obviously, so if we knew this would be a problem, we'd have tried to work around it better.
It seems clear that one could imagine other designs that would avoid some of these problems, even if they're not incomplete solutions. For example, as @Mark-Simulacrum points out, we might modify our parser so that #[cfg]
attached to items causes us to skip the function body (which must be a $tt
) and parse it only after cfg-gating is applied. This would allow arbitrary syntax and would be a win for procedural macros. It would also probably be more work. On the other hand, it might be nice to do the easier thing in the meantime, to buy us time for such a design. (And maybe there are other complications I'm not considering.)
One thing that would be useful here is getting a better idea of whether other crates are affected beyond miri... do we expect to have more data on that, and -- if so -- when? Sounds like we would need to complete a crater run on beta?
I'm pondering @Mark-Simulacrum's comment here:
I feel like this is a clear example of us not having built consensus yet around how to integrate these changes, and prior work (i.e., the PRs that @Centril has listed) having landed in this area without being complained about is not an argument. The point here is that we are gating things in a way that is perhaps non-desirable. We should discuss this, in public, and build consensus. Even if strictly speaking we can land this change, it does not mean we should. This is exactly what RFCs are for: building consensus.
In general, I do think we tend to avoid RFCs a bit more than we ought. I can see the case for doing this work without an RFC, but I also think it would be great if we had a document that outlined the alternatives and made the case for this particular approach to the change. It would also give us something to point to when people have questions about why their code is not building -- and document workarounds, etc.
One thing that would be useful here is getting a better idea of whether other crates are affected beyond miri... do we expect to have more data on that, and -- if so -- when? Sounds like we would need to complete a crater run on beta?
I know we hit it at least in script-servo macros in perf.rlo's suite, though that particular case is being fixed as we speak (https://github.com/rust-lang/rust/issues/65846 and fix in https://github.com/rust-lang/rust/pull/65974).
I do suspect a beta crater run might reveal more breakage, thinking along similar lines as @RalfJung I believe -- we can queue one, I suspect, without too much trouble? An easy way might be to just get try commit builds for stable-feeling compilers with and without the PR on master.
@RalfJung
So I don't get why you want to claim the reverse now?
Special in terms of feature gating or not; not special in terms of interactions with cfg
s, which are really macros on ASTs expanding to nothing (so it's not strange that feature gated syntax interacts with it).
Given that I can have entirely my own fantasy syntax in a macro (and @petrochenkov bases a work-around for this issue on that), it seems odd that the parser is seemingly able to ignore these problems in macro bodies but not in
cfg
-disabled code.
This is the case because the macro won't expand at all if it is not included in the configuration. Meanwhile, the parser only performs cfg-stripping in a single place (when placed directly on mod foo;
items) and this is a total hack that has caused problems (cyclical dependencies) wrt. splitting the AST, the parser, and the expansion code apart into different crates.
Interesting. Does this mean that it is impossible to conditionally use those features behind
cfg
? For&raw
, that will be seriously annoying inmemoffset
.
Not impossible; you'll need to use #[cfg(...)] mod foo;
there as well. https://github.com/rust-lang/rust/pull/65742 adds to an existing infrastructure for pre-expansion gating.
This incentivizes organizing your code by Rust features as opposed to its logical structure, which is bad.
I think one can still keep the logical structure of the code by using a sub-module below the thing the module is about. This would involve several modules for unstable syntax rather than one for all unstable syntax in the whole crate. There are advantages to organizing the code by Rust features: it makes it easier to see where they are.
So e.g. if I wait until
let
-expressions are stable, and then start using them in my crate but only if a new enough rustc is detected -- then only exactly those versions of rustc that have the feature in experimental will fail to compile (older versions are fine), which I might easily miss if I only test the MSRV and latest stable.
Again, unless you use e.g. #[cfg(version(1.41.0)] mod foo;
to do so.
@nikomatsakis
As far as crater goes, it is good that we did a crater run to estimate the impact, but I think that this comment misunderstands how crater ought to be used:
I think we disagree re. "ought".
but doing a crater run doesn't mean we don't have to consider problems that we failed to foresee.
And we are dealing with some of the problems; in a better and if I might say so, more and principled and permanent way.
As to our immediate action, I'm wary. As a general rule, I tend to favor backing out PRs that are causing unanticipated problems. This doesn't mean we won't add it back again, it just buys us time to have the discussion in a calmer fashion.
I think we should back out of specific feature gates once the data is in; meanwhile we can deal with the problems as in e.g. https://github.com/rust-lang/rust/pull/65974.
I think it's pretty clear that this particular interaction was unforeseen -- we have a vested interest in miri's test suite remaining functional, obviously, so if we knew this would be a problem, we'd have tried to work around it better.
Well interactions with cfg
was not unforeseen; that's basically the point of pre-expansion gating. As for Miri, I wouldn't ascribe special importance to it here. Miri breaks regularly due to random changes throughout the compiler, sometimes for days.
It seems clear that one could imagine other designs that would avoid some of these problems, even if they're not incomplete solutions. For example, as @Mark-Simulacrum points out, we might modify our parser so that
#[cfg]
attached to items causes us to skip the function body (which must be a$tt
) and parse it only after cfg-gating is applied.
By "must be a $tt
", are you saying that we would bake this laziness into the lexer itself (i.e. TokenCursor::next_token
)? That's the only way I can make sense of this. I'm not even sure that's implementable and the consequences would be quite hard to predict. Also note that it's not just function bodies that we would have to care about; the other item forms have { ... }
and the same parsing methods are used for loops and other forms of blocks. Special casing function bodies seem ad-hoc, and just makes learning the rules more complicated.
My view is that the least hacky way is to fix https://github.com/rust-lang/rust/issues/64197 and then call it a day. mod foo;
provides a natural, and easily implementable boundary to stop parsing. @petrochenkov's delay_parsing!
macro should provide a reasonable mechanism otherwise. If it were included in the standard library that would also make it clearer what mechanism to use.
(And maybe there are other complications I'm not considering.)
Well for one, we might need to look beyond {
if we want to e.g. have structural records.
It would also give us something to point to when people have questions about why their code is not building -- and document workarounds, etc.
Well the current rules are in flux, so we might need to change that documentation, but this seems reasonable, and the current discussion provides good background.
triage: P-high. Leaving nominated because I want this discussed at today's T-compiler meeting.
@nikomatsakis wrote:
As a general rule, I tend to favor backing out PRs that are causing unanticipated problems. This doesn't mean we won't add it back again, it just buys us time to have the discussion in a calmer fashion.
I agree with this philosophy, and think T-compiler, as a team, should codify it as one of our guiding values. (Of course, we should discuss it as a team before doing so.)
In such codification, it may be good to include notes about what criteria should be met for such backouts: e.g. what stakeholders (such as the PR author or original bug filer) to engage.
I've opened #66004 for a revert of the behavior changed here, but I expect it to be temporary, pending further discussion.
(GitHub got the timestamp for this post wrong; it was written before @petrochenkov's post above.)
cfgs, which are really macros on ASTs expanding to nothing (so it's not strange that feature gated syntax interacts with it).
The issue here is that I (and likely others) have formed a different mental model for cfg
's -- that it is a normal macro that expands to nothing. That model likely never was fully correct but "correct enough", and only now is it blatantly broken. Most (all?) of our macros act on token tress, so naturally my assumption was that the same is true for cfg
.
The new behavior is indeed consistent with what you say, but we likely have some work to do then to make sure that people learn the right mental model when learning about cfg
. And on a larger scale of things it seems somewhat inconsistent to have exactly one macro act on ASTs while everything else acts on token trees.
If we really want to have two kinds of macros and make people learn about both, maybe one outcome of this discussion is that attribute-based cfg
is suboptimal for cases like the ones I described above -- and that we ought to have something like a cfg_if!(condition) { tokens }
or so (something like @petrochenkov's work-around but with cfg
integration), a truly token-based macro that avoids even parsing the tokens if the condition is not met. If we make the syntax slightly more annoying, like cfg_if!(condition { tokens })
, this could likely be implemented as a normal macro?
I think one can still keep the logical structure of the code by using a sub-module below the thing the module is about. This would involve several modules for unstable syntax rather than one for all unstable syntax in the whole crate. There are advantages to organizing the code by Rust features: it makes it easier to see where they are.
Very rarely is it a good idea to organize your code by the language features you are using. You don't put all you struct
definitions in one file and all enum
in another file and all impl Trait for
in a third file, either -- that's just a mess. So I strongly disagree with you claiming that splitting things into files for this purpose is anything but an awful hack.
Well interactions with cfg was not unforeseen
Just because you foresaw it and privately decided for you that this is okay, doesn't mean it was "foreseen" as part of the wider consensus building. Could you point to where you/others documented this foreseen interaction with a pattern that is used throughout the ecosystem (cfg
-enabling nightly features) and decided dropping support for that is a reasonable trade-off here? You pointed to some low-level discussion exempting concrete feature gates based on crater results, but what I am missing is a discussion of the high-level concerns here. Given that the move to eager stability checking on syntax started a while ago, that discussion then should have happened whenever that move started. This applies just as well to new feature gates being added with eager checking (but that people will want to control with cfg
) as to old ones that have their checking changed.
(Yes I am aware of the awful hack you propose to work around this. That's still "dropping support" as far as I am concerned.)
@RalfJung
Most (all?) of our macros act on token tress, so naturally my assumption was that the same is true for cfg.
All our macros indeed take token trees as an input, including cfg
(at least in theory).
The difference is that fn-like macros act on unrestricted token trees, but attribute and derive macros act on restricted token trees - their input tokens must be a piece of valid Rust syntax.
The result of the "is this valid Rust syntax?" query can be changed by unstable features, that's why we need to gate early.
So,
#[my_custom_attr_that_works_like_cfg]
macro m() {}
will also produce a feature gate error.
The reason for the token trees being restricted is parsing. When we see
#[attr] t1 t2 t3 t4
we need to parse t1 t2 t3 t4
as Rust syntax to determine which part of it is an input to #[attr]
and which is not, e.g.
#[attr] struct S ; struct Z ;
Fn-like macros have well-delimited input, so they don't have this problem.
I'm pretty sure we can delay parsing of tokens inside delimited groups (()
, []
, {}
) until expansion and still determine attribute inputs correctly, but implementation of this is pretty much https://github.com/rust-lang/rust/issues/43081, so the chance that it's going to be done any time soon is very slim.
@petrochenkov okay, so things are consistent between cfg
and other attribute-based macros then, that makes sense. And given our non-uniform syntax for items indeed it seems hard to find out where the item ends without constructing an AST.
The most stable way forward then seems to be to have a fn-like cfg
macro, and give people some time to move to that before phasing out the ability of attribute-based cfg
to remove the use of a nightly feature?
The most stable way forward then seems to be to have a fn-like
cfg
macro, [...]
Do you mean adding delay_parse
to the standard library or some such? Seems like a great idea to me if so and it is also trivial to do.
and give people some time to move to that before phasing out the ability of attribute-based
cfg
to remove the use of a nightly feature?
It seems to me that would depend entirely on the outcome of the second crater run; for feature gates with largely the same results I don't think it's justified to do a lot of non-trivial work to soft-gate things for just some 1-5 regressions for a feature.
So I strongly disagree with you claiming that splitting things into files for this purpose is anything but an awful hack.
Well, let's agree to disagree.
Could you point to where you/others documented this foreseen interaction with a pattern that is used throughout the ecosystem (
cfg
-enabling nightly features) and decided dropping support for that is a reasonable trade-off here? You pointed to some low-level discussion exempting concrete feature gates based on crater results, but what I am missing is a discussion of the high-level concerns here. Given that the move to eager stability checking on syntax started a while ago, that discussion then should have happened whenever that move started.
I do not accept the assertion that we are "dropping support". This suggests that this was an intentional property of the language that we are now moving away from. History suggests otherwise however. Meanwhile, accidentally stable features are still unstable.
As for the motivation to pre-expansion gate, it is quite straight-forward. It is the same motivation as for feature gating semantic aspects: We might want to change X or remove it. Having garbage parsing code and garbage AST nodes which also must be documented in the spec is not a good idea just as it is not a good idea to have that in the type checker.
This applies just as well to new feature gates being added with eager checking (but that people will want to control with
cfg
) as to old ones that have their checking changed.
If you want to make the case that we should not pre-expansion gate new syntax that is added in the future and move existing pre-expansion gating to post-expansion then you are welcome to make that case with an RFC (although it would take a lot to convince me, but I'll listen with an open mind). In the interim however, new syntactic features, if they want to become implemented, need to use pre-expansion gating.
Do you mean adding delay_parse to the standard library or some such? Seems like a great idea to me if so and it is also trivial to do.
Not literally delay_parse
but one of the ones I suggested -- something with cfg
in the name and that also takes a cfg expression. That should be more discoverable than having to put together two pieces, one of which looks like a NOP.
So I imagine uses like (syntax totally pre-bikeshed, obviously)
cfg_if! { nightly:
macro foo() {}
}
Maybe we should just incorporate https://crates.io/crates/cfg-if into libstd?
If you want to make the case that we should not pre-expansion gate new syntax that is added in the future and move existing pre-expansion gating to post-expansion then you are welcome to make that case with an RFC (although it would take a lot to convince me, but I'll listen with an open mind).
I want to make the case that rustc+libstd should support cfg-gating uses of all nightly features, inline in the same file. This was (accidentally, it seems) supported so far, and it is used in the wild. I do not have a strong preference for how we achieve that support. Given what I learned in this thread, it seems like a fn-like macro is the better choice for this than the attribute-like macro that is cfg
.
I see a parallel here to mem::uninitialized
, where it seemed like we supported using that to gradually initialize elements of any type, but really that always was UB (given what we told LLVM). But in that case, we made sure to land the proper replacement MaybeUninit
before breaking the old, unsupported-but-practically-working variant. Of course, mem::uninitialized
also was/is widely used, so just removing the old option without a replacement readily available was definitely not an option. The planned crater run should help get an idea how widely used cfg-gating of nightly features is.
Maybe we should just incorporate https://crates.io/crates/cfg-if into libstd?
This should be easy. It was sorta proposed in https://github.com/rust-lang/rust/issues/59442 -- you could file another PR for T-libs to consider link back here for rationale.
I see a parallel here to
mem::uninitialized
, where it seemed like we supported using that to gradually initialize elements of any type, but really that always was UB (given what we told LLVM). But in that case, we made sure to land the proper replacementMaybeUninit
before breaking the old, unsupported-but-practically-working variant. Of course,mem::uninitialized
also was/is widely used, so just removing the old option without a replacement readily available was definitely not an option. The planned crater run should help get an idea how widely used cfg-gating of nightly features is.
I think there are problems with this parallel:
mem::uninitialized
is one function which got a replacement. In this case we are talking about several different feature gates added at different times. Moreover, if we post-expansion gate new syntax, we are making the problem worse. It would be as if we continued adding mistakes of the mem::uninitialized
variety again and again to the standard library.
There are three existing alternatives: delay_parse
(a one-liner), cfg_if
(a popular crate), using #[cfg(foo)] mod bar;
(sure, you don't consider it an alternative, but there are two others...). In the case of MaybeUninit<T>
, it was completely unrepresentable in many cases.
This should be easy. It was sorta proposed in #59442 -- you could file another PR for T-libs to consider link back here for rationale.
I don't have time to push through a PR for this, just leaving my thoughts for when T-compiler, T-lang and maybe T-libs discuss how to resolve this feature gating issue.
So we've concluded a crater run (results) that was (hopefully) successful in testing stable/stable so as to not report false negatives due to crates detecting a nightly compiler and enabling their features.
The results are quite positive -- only a few crates regressed. That leads us to believe that we can plausibly "get away" with this change from at least the perspective of not really breaking too many people in practice. My explanation for this is that we only gate syntax in this manner, which is relatively uncommon (almost all features are not syntax-based).
Given that information, I feel more comfortable landing this change if we fix all known cases uncovered by crater, so I won't attempt to block it. However, I still feel that the appropriate path for making this change is to have some discussion around it. At least we should definitely make sure release notes include the macro "hack" that permits gating successfully in the same file.
I would also personally feel much better about this feature if the scope of gating was only stuff that is in the top-level gated TokenStream. i.e., I believe @eddyb has said that TokenStreams are inherently nested today, so I would expect the following to not have any gating errors (syntax is wrong in some cases, but I would actually also expect that to not be a problem personally, given the approach of cfg being token-level, not AST-level.
If that approach requires an RFC to do fully, that's fine, I think that's reasonable, given the implications are pretty broad. I still lean then towards not gating old syntax (and maybe new syntax) until we have such an RFC.
#[cfg(foo)]
mod foo {
arbitrary tokens here if not --cfg foo, as we're delimited by '{' and '}'
box foo; // gated syntax
}
#[cfg(foo)]
fn foo() {
arbitrary tokens here if not --cfg foo, as we're delimited by '{' and '}'
box foo; // gated syntax
}
fn foo() {
#[cfg(foo)]
{
arbitrary tokens here if not --cfg foo, as we're delimited by '{' and '}'
box foo; // gated syntax,
}
}
One thought I've had while thinking about this is that syntax is for the most part, not something you put behind cfg or so, as you don't want to write two code paths. However, I would be interested in seeing if we can draw upon the JS community's success with Babel and similer "transpilers" that lower modern syntax into older syntax to enable backwards-compatible experimentation in any library with e.g. try
and other "pure" syntaxes, that would lower via proc macro expansion into forms that work on stable today. That's a bit of a tangent though :)
@Mark-Simulacrum thanks! Looks like this pattern is less widely used than I expected then, plus as you mentioned new syntax is fairly rare.
The release notes could also point to cfg_if
.
One thought I've had while thinking about this is that syntax is for the most part, not something you put behind cfg or so, as you don't want to write two code paths. However, I would be interested in seeing if we can draw upon the JS community's success with Babel and similer "transpilers" that lower modern syntax into older syntax to enable backwards-compatible experimentation in any library with e.g. try and other "pure" syntaxes, that would lower via proc macro expansion into forms that work on stable today. That's a bit of a tangent though :)
My impression is that in Rust this niche is already covered by the various nanocrates that offer macros for faking not-yet-stable features, like async-trait
, because the nature of proc macros and an AOT-compiled language means those are already guaranteed to do all their work at "build time" just like Babel is for JS. Perhaps someone could write a canonical list of links to all such crates, but other than that I don't think there's anything left for a general transpilation tool to solve.
(on topic, I'm also convinced by the crater results that this change is fine as long as we do the usual work of sending PRs for the crater regressions and documenting it clearly in release notes)
For the record, I don't think we should land this even if crater passes, and especially if crater doesn't pass (whether or not we submit patches to crates it breaks).
removing nomination label; since the most problematic parts of this issue have been reverted (PR #66004), the T-compiler team does not need to discuss this, at least not until follow-up work has been done.
What is the status of this and what are the plans? Does the "regression" label still apply?
As far as I know, we never had the follow-up discussion around this. I would still like to have it, though!
I'm not totally sure on what parts were reverted and what remains.
As far as I know, the de facto standard is that one must still isolate "nightly syntax" into separate files in order to avoid feature gates. This is certainly true of new syntax that is being added.
It's certainly the easiest implementation strategy, but I would still like to talk out whether we could do something at a different granularity (e.g., ignoring the contents of function bodies that are cfg'd out), which I think would match user's expectations.
That said, I could be persuaded that the implementation effort of supporting that is not worth it at this time.
@Centril says reverting the revert hasn't happened yet.
(apparently didn't record it in this thread)
The gates for unstable syntax which were reverted by #66004 were turned into future-compatibility warnings in
I tried to make pre_configure_attrs
change krate.attrs
in place, but it seems to cause the future compat error here to turn into a hard error. I'm leaving my patch here in case someone wants to make the same change after this gets turned into a hard error.
https://github.com/rust-lang/rust/pull/66004 fixed the original issue; now this tracks re-landing those checks (possibly after some transition period).
Original issue
At https://github.com/RalfJung/miri-test-libstd/, I have set things up such that one can run the libcore and liballoc test suite against Miri. The basic idea is to have a
Cargo.toml
file like this (very close to the normal one for libcore):Then I make
../libcore
a symlink to the actual libcore sources, cd into the dir with the aboveCargo.toml
, and runcargo miri test
. This worked fine until recently (until nightly-2019-10-17, to be precise).But since then something broke, and now I get lots of build failures:
I tried adding these feature gates to every crate root I could think of (the
lib.rs
of both libcore and its test crate), but the errors are sticking. Unfortunately the error doesn't tell me which file it thinks is the crate root.Any idea what this could be caused by? Until this is fixed, we won't have Miri coverage of the libcore and liballoc test suite.