rust-lang / rfcs

RFCs for changes to Rust
https://rust-lang.github.io/rfcs/
Apache License 2.0
5.93k stars 1.57k forks source link

Prohibit implementations of outer traits and types in blocks (function bodies, etc) #1355

Open petrochenkov opened 8 years ago

petrochenkov commented 8 years ago

Consider the next code:

struct S;
trait A {}
trait B {}
trait C {}

fn check<T: A + B + C>(_: T) {}

fn block1() {
    impl A for S {} 
}
const BLOCK2: u8 = { impl B for S {} 0 };

const BLOCK3: [u8; { impl C for S {} 0 }] = [];

fn main() {
    check(S);
}

We see here that some things internal to function bodies or other blocks can leak outside and affect compilation of remote pieces of code. This doesn't normally happen (with exception of consts and const functions, which leak their bodies through their results available at compile time, but this is by design).

In the example above to find out if check(S) compiles or not compiler or other tools have to traverse the whole AST looking into every function, expression or type, because blocks can live pretty much anywhere and blocks can contain impls. Without impls in blocks algorithms traversing AST can take major shortcuts and scan only the surface, ignoring internals of function bodies, types, fields, etc. which is especially important for tools (cc @nrc) or things like incremental compilation (cc @nikomatsakis), I suppose.

I've seen several visitors in the compiler that already take these shortcuts, leading to errors in corner cases. I propose to make it official and prohibit implementing traits and types not belonging to a block in this block. I wouldn't expect it to break anything in existing code, but to bring more speed to some visitors and more convenience to people writing them.

Detailed design:

nagisa commented 8 years ago

Would you consider two different modules (or their files) to belong into same block or different blocks?

i.e. something like this is used throughout MIR trans and probably would break a lot of non-our codebases:

mod rvalue { // rvalue.rs
    impl MIR {
        …
    }
}
mod lvalue { // lvalue.rs
    impl MIR {
        …
    }
}
petrochenkov commented 8 years ago

@nagisa By blocks I mean ast::Block - function bodies, block expressions.

petrochenkov commented 8 years ago

This idea is partially inspired by this post from @phildawes's blog. He writes:

Rust has the nice charactistic that a function signature is enough to type-check a caller without the analysing the body, so the first obvious transformation is to remove function bodies except for the one you want to type check.

impls in blocks break this nice charactistic, removing other function's bodies can affect compilation of unrelated functions.

petrochenkov commented 8 years ago

Another alternative is to make HIR less tree-like, remove impls from it somewhere after resolve and gather them in one crate-scoped set. It's much more intrusive change and the simple prohibition can always serve as a temporary measure until it's implemented.

nikomatsakis commented 8 years ago

On Sun, Nov 08, 2015 at 10:24:33AM -0800, Vadim Petrochenkov wrote:

Without impls in blocks algorithms traversing AST can take major shortcuts and scan only the surface, ignoring internals of function bodies, types, fields, etc. which is especially important for tools (cc @nrc) or things like incremental compilation (cc @nikomatsakis), I suppose.

I don't think this will help much for incremental compilation. We have to parse the code anyway, and scanning the AST to find impls is fairly cheap.

petrochenkov commented 8 years ago

These impls have some other interesting consequences, for example types defined in blocks can be pulled outside:

trait Exporter {
    type Output;
}
struct Helper;

fn block() -> <Helper as Exporter>::Output {
    struct Inner;
    impl Inner {
        fn poke(&self) { println!("Hello!"); }
    }

    impl Exporter for Helper {
        type Output = Inner;
    }

    Inner
}

fn main() {
    block().poke();
}
petrochenkov commented 8 years ago

Resolved by https://github.com/rust-lang/rust/pull/29903 I suppose.

matklad commented 5 years ago

I would like to reopen this: prohibiting "leaking" impls inside expressions on the language level will be a great help for the IDEs.

Specifically, this will unlock lazy (as opposed to incremental) compilation where, to compute completions, we wouldn't have to look at the bodies of functions in extern-crates at all.

Relevant discussion in Zulip: https://rust-lang.zulipchat.com/#narrow/stream/185405-t-compiler.2Fwg-rls-2.2E0/topic/Why.20lazy.20parsing.20won't.20work.20for.20rust-analyzer

comex commented 5 years ago

It does seem logical that things inside blocks shouldn’t leak out of the block. On the other hand, it’s a breaking change, and even changing it as part of an edition would be tricky since there wouldn’t be any straightforward automatic migration path.

Hmm... IDEs already have to deal with the notion of code changing over time. Couldn’t you start by optimistically assuming there aren’t any weird impls, defer parsing function bodies until later, then once you do, if the assumption turns out to be false, recalculate things in a similar way as if the user had actually edited the dependency to add a new impl? The user would see incorrect information in the meantime, but simply parsing a crate shouldn’t take very long, so it would be corrected quickly.

That is, unless the crate uses a procedural macro, which might take a long time to execute and certainly will take some time to compile. But that’s arguably another reason for IDEs to use a speculative approach. Procedural macros are more likely to appear at the global level anyway, where they certainly could add impls or many other things. But you could produce an initial parse that just ignores them, compile and run them in the background, and revise things once they’re done…

Centril commented 5 years ago

@matklad It seems to me that reopening this, aside from @comex's notes re. breakage, would immediately brick the use case in https://github.com/rust-lang/rust/issues/54912.

matklad commented 5 years ago

Hmm... IDEs already have to deal with the notion of code changing over time.

@comex I don't think that would work nicely: lazyness and incrementality are orhogonal, and even perfect incremental can not compensate for the absence of lazyness.

another reason for IDEs to use a speculative approach.

This might seem a bit idealistic, but I don't think that speculative approaches should be used in IDEs. Eg, speculatively not seeing an impl in multi-million line code base seems like a recipe for incorrect refactoring. I would definitely trade performance (avoid parsing bodies) for correctness (seeing all impls) in this case

On the other hand, it’s a breaking change

I am 100% ok with perma-deprication. The fun thing is, if the user requiers #[allow(leaking_impls)] to squash the lint, that would serve as marker for IDE that the function body needs examination :D

would immediately brick the use case

Heh, that's an interesting language-design problem to solve: both motivations, "shallow reasoning" and "compile-time expressions without binding random names" seem sound. The simplest solution which would fall out naturally out of implementation is that ones need to annotate a const with #[allow(leaking_impls)]. However, from reading the RFC, it's not clear if there's a need to put global impls inside the expressions... It seems like impls could be placed in enclosing module just fine, b/c they don't introduce any names?

Centril commented 5 years ago

This might seem a bit idealistic, but I don't think that speculative approaches should be used in IDEs.

I think you should make every effort to try this out before even thinking about changing the semantics of the language. That should be a last resort and you should try to live with the language we've got.

I am 100% ok with perma-deprication.

Even so, it seems like a disruptive change. I would encourage you to test this out by turning it into a hard error and checking with crater how many crates are broken by the rules described in the issue description.

There's also a matter of language specifiability. The rules aforementioned feel ad-hoc and something I would not want to have in a reference unless it is very well motivated.

The fun thing is, if the user requiers #[allow(leaking_impls)] to squash the lint, that would serve as marker for IDE that the function body needs examination :D

How do you deal with --cap-lints implying #[allow(leaking_impls)] or #![allow(warnings)]? Moreover, how do you deal with a macro expanding to a struct, an impl, and #[allow(leaking_impls)] on said impl..?

It seems like impls could be placed in enclosing module just fine, b/c they don't introduce any names?

You are placing the impl inside a const because you want to introduce names locally that the impl can rely on. If you place the impl outside of the const then you no longer have access to those names.

matklad commented 5 years ago

That should be a last resort and you should try to live with the language we've got.

Pretty strongly disagree with this point: language design, informed by the tooling needs, leads to better overall product and happier users.

The rules aforementioned feel ad-hoc and something I would not want to have in a reference

To me, the fact that usual expressions can affect type-checking unrelated code in other modules/crates without explicit opt-in seems to be a surprising behavior and modularity violation. I would love to remove it just to make the language less surprising. This is not unlike coherence, and, especially, public-in-private rules. But I am not a language designer, so I don't have a strong opinion here besides "the trade off is that the IDE will have to process significantly more code"

How do you deal with --cap-lints implying #[allow(leaking_impls)] or #![allow(warnings)]?

We'll have to treat those specific crates in the crate graph specially, by eagerly processing item bodies, making IDE slower.

how do you deal with a macro expanding to a struct, an impl, and #[allow(leaking_impls)] on said impl..?

Could you give a specific example of what you are thinking about? I don't get why struct and impl are important here. Or is it just the general "how to deal with macros" question? If the latter, then we just eagerly expand top-level macros, and see the #{allow(leaking_impl)] attributes. And we can avoid expanding macros inside expressions, because we know they can't have global effects.

You are placing the impl inside a const because you want to introduce names locally that the impl can rely on

Yeah, that could be the case... I've missed it originally because RFC only included example of a trait being implemented for an expression-local type, which is OK since it is non-leaking.

matklad commented 5 years ago

I've found another case where we leak info from blocks into the outside world: #[macro_export] macros:

fn f() {
    #[macro_export] 
    macro_rules! foo { () => ( fn bar() {} )}
}

foo!();

fn main() {
    bar()
}

playground