Tracking Issue for RFC 3373: Avoid non-local definitions in functions

traviscross commented 7 months ago

This is a tracking issue for the RFC 3373: Avoid non-local definitions in functions (rust-lang/rfcs#3373).

Summary

Add a warn-by-default lint for items inside functions or expressions that implement methods or traits that are visible outside the function or expression. Consider ramping that lint to deny-by-default for Rust 2024, and evaluating a hard error for 2027.

warning: non-local `impl` definition, they should be avoided as they go against expectation
  --> src/de/impls.rs:21:9
   |
21 | /         impl Visitor for Place<()> {
22 | |             fn null(&mut self) -> Result<()> {
23 | |                 self.out = Some(());
24 | |                 Ok(())
25 | |             }
26 | |         }
   | |_________^
   |
   = help: move this `impl` block outside the of the current associated function `begin`
   = note: an `impl` definition is non-local if it is nested inside an item and neither the type nor the trait are at the same nesting level as the `impl` block
   = note: one exception to the rule are anon-const (`const _: () = { ... }`) at top-level module and anon-const at the same nesting as the trait or type
   = note: this lint may become deny-by-default in the edition 2024 and higher, see the tracking issue <https://github.com/rust-lang/rust/issues/120363>
   = note: `#[warn(non_local_definitions)]` on by default

About tracking issues

Tracking issues are used to record the overall progress of implementation. They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions. A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature. Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

[x] Implement the RFC: https://github.com/rust-lang/rust/pull/120393
[ ] Adjust documentation (see instructions on rustc-dev-guide)
[ ] Stabilization PR (see instructions on rustc-dev-guide)

Unresolved questions

[ ] We need a crater run to look at how widespread this pattern is in existing code.
- https://github.com/rust-lang/rust/pull/120393#issuecomment-1925359993
[ ] Should we flag these definitions in anonymous const items as well, or would that produce unwanted warnings?
- We do not currently warn on const _: () = { /* .. */ } since the pattern is widely used in the ecosystem, including serde_derive.

In this case I would say it is "part of the development and stabilization process", because I'm pretty sure your interpretation of how this is supposed to work with parameterized types/traits already matches what the lang team thought they were agreeing to (but cc: @rust-lang/lang ), in which case we just need to fix the lint before stabilization. This was brought up during the RFC period, although without a clear resolution. The responses of @lnicola ([1] [2]) seem to suggest this interpretation and appeared to go uncontested before the end of the RFC, so I'm inclined to believe the authors also agree with it and the lint is wrong.

dhardy commented 6 months ago

My 2 cents: a specification which requires reading 50+ comments and understanding which are applicable is rather hard to use (especially for anyone not closely following). And, yes, I did notice that this issue had already been brought up (multiple times), but did not see any prior attempts to codify a solution which felt adequate. Having a codified goal gives a development target (which may or may not prove viable) and a way to document to others who show up (like myself) the possible development direction.

GuillaumeGomez commented 6 months ago

This is currently being triggered on doc tests, for example:

/// struct MyF32 {
///     x: f32,
/// }
///
/// impl MyF32 {
///     fn sample(&self) {}
/// }
/// ```

I think it should not take doctests into account.

Ddystopia commented 4 months ago

Hello, that RFC disallows code like that, and it is non-local by it's nature

#[diagnostic::on_unimplemented(message = "Type was not produced", label = "Type to be produced")]
#[allow(dead_code)]
trait Produced {}
#[allow(dead_code)]
trait AssertProcuded: Produced {}

trait ProducedExactlyOnce {
    type Args;
    fn produce(args: Self::Args) -> Self;
}

macro_rules! assert_produced {
    ($generic:ty) => {
        impl AssertProcuded for $generic {}
    };
}

macro_rules! produce_exactly_once {
    ($name:ty, $value:expr) => {{
        impl Produced for $name {}
        <$name as ProducedExactlyOnce>::produce($value)
    }};
}

struct Foo(u32);
struct Bar(i32);

impl ProducedExactlyOnce for Foo {
    type Args = u32;
    fn produce(args: Self::Args) -> Self {
        Self(args)
    }
}

impl ProducedExactlyOnce for Bar {
    type Args = i32;
    fn produce(args: Self::Args) -> Self {
        Self(args)
    }
}

assert_produced!(Foo);
assert_produced!(Bar);

fn main() {
    let Foo(foo) = produce_exactly_once!(Foo, 42);
    let Bar(bar) = produce_exactly_once!(Bar, 42);

    println!("{foo:?} {bar:?}");
}

Manishearth commented 2 months ago

This breaks displaydoc.

A lot of macros use this trick to create local scopes. I don't think this should become a hard error unless there are well supported ways to do this.

Manishearth commented 2 months ago

Actually no, this isn't displaydoc, this is doctests breaking

https://github.com/unicode-org/icu4x/actions/runs/9549144605/job/26318035304

Manishearth commented 2 months ago

Nope, this also occurs for displaydoc without any doctest shenanigans: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=c9584f9ecbd510b358e6be29abc7ea39

This kind of const-wrapping is a common macro trick, we should try to support that somehow

https://github.com/yaahc/displaydoc/issues/46

bjorn3 commented 2 months ago

We do allow it for const _, just not for named constants like displaydoc uses.

Manishearth commented 2 months ago

@bjorn3 Yeah, but it seems like const _ automatically hoists the internal items to the outer scope, defeating the purpose?

Or at least that's what the error message says:

  = note: items in an anonymous const item (`const _: () = { ... }`) are treated as in the same scope as the anonymous const's declaration

bjorn3 commented 2 months ago

They are treated the same for the purpose of this lint. They still encapsulate name resolution.

Manishearth commented 2 months ago

@bjorn3 Hmm, in that case the current error message appears misleading. To me, it more or less explicitly says that they don't encapsulate name resolution.

jstarks commented 2 months ago

Is it intended that this lint fires in cases like these?

trait Tr {}
struct Foo<T>(T);
fn foo() {
    struct Bar;
    impl Tr for Foo<Bar> {}
}

Currently it fires.

This seems somewhat against the spirit of the lint. Foo<Bar> is cannot be named outside of foo(), so the reasoning behind the lint doesn't to apply. The impl isn't at the same level as its item, but it is at the outermost level it can be without making Bar visible to surrounding functions.

The lint also fires on fundamental types like Box<T>, which is even more unexpected:

trait Tr{}
fn foo() {
    struct Bar;
    impl Tr for Box<Bar> {}
}

This seems like a bug or at least a wart.

traviscross commented 2 months ago

@jstarks: That is worth filing a dedicated issue about, if you would.

jstarks commented 2 months ago

OK, #126768. Feel free to clean up the title and such.

chinedufn commented 2 months ago

I quickly scanned though the RFC https://github.com/rust-lang/rfcs/pull/3373 comments and didn't see anything about macro generated runtime tables.

Here's a minimal example of code that should not become a hard error:

struct MyType;

// Heavily simplified version of a macro in my codebase. 
macro_rules! my_macro {
    ($($fn_name:ident),*) => {
        impl MyType {
        pub fn $fn_name(&self) {
                SomeHandler::$fn_name()
        }
        }

            do_stuff(&some_type, stringify!($fn_name));
    }
}

fn initialize(some_type: &MyType) {
    my_macro!(
        method1,
        method2,
        method_foo,
        method_bar,
    );
}

fn do_stuff(some_type: &MyType, method_name: &str) {
    // Oversimplified, in real code the key and value might be different here.
    some_type.insert_into_runtime_generated_table(method_name, method_name);
}

In the above example we initialize function takes in MyType.

It calls the macro my_macro! with names of methods to implement on MyType.

my_macro! also generates a runtime call to fn do_stuff that adds the stringified method name to some table.

To satisfy the lint the above could would need to be changed to:

// BAD: Now we're using two macro invocations instead of one.

struct MyType;

macro_rules! implement {
    ($($fn_name:ident),*) => {
        impl MyType {
            pub fn $fn_name(&self) {
                SomeHandler::$fn_name()
            }
        }
    }

}

macro_rules! bind {
    ($($fn_name:ident),*) => {
        do_stuff(&some_type, stringify!($fn_name));
    }
}

implement! {
    method1,
    method2,
    method_foo,
    method_bar,
};

fn initialize(some_type: MyType) {
    bind!(
        method1,
        method2,
        method_foo,
        method_bar,
    );
}

fn do_stuff(some_type: &MyType, method_name: &str) {
    // ...
}

Pattern

The pattern here is when you need to implement a method on a type AND also register that method name in some runtime generated table.

Real World

This section is included to help prove that the above problem can be encountered in real-world code. It does not need to be read to understand the problem. For this reason I did not take the time to make this real-world example easy to understand / illustrate with simple code examples.

A real world example of when something like this comes up is when using wasm-bindgen to instantiate a WebAssembly module instance at runtime.

To expose a Rust type's methods to the wasm module you need to write code that essentially maps NAME OF WASM IMPORT -> NAME OF HOST METHOD.

This way when the wasm module calls my_wasm_module_import the host_method gets called.

Closing Thoughts

RFC 3373 https://github.com/rust-lang/rfcs/pull/3373 states the following as the motivation for the change:

Currently, tools cross-referencing uses and definitions (such as IDEs) must search inside all function bodies to find potential definitions corresponding to uses within a function.

Humans cross-referencing such uses and definitions may find themselves similarly baffled.

In my case the human would be more confused if they had to check two macros to understand this pattern instead of one.

My alternative would be to write a build script or some other form of codegen to generate this, but that would be more difficult for the human to find and maintain than a simple inlined macro.

bjorn3 commented 2 months ago

Can you generate the initialize function directly from your macro?

chinedufn commented 2 months ago

My real initialize method does a little bit of setup before invoking the macro.

My development environment's syntax highlighting / other tools work better outside of macros than inside, so I'd prefer to avoid moving code into a macro to satisfy a lint.

That said, I could have my macro generate the initialize and then have initialize call some other initialize_inner_setup() function, then move my other setup code into fn initialize_inner_setup. Then my macro could remain fairly lean.

my_macro! {
    // generated impl blocks here

    fn initialize() {
        initialize_inner_setup();

        // ... generated code that inserts into a runtime table here ...
    }
}

my_macro!();

#[inline]
fn initialize_inner_setup() {
    // ... setup code that does not need to be in the macro ...
}

The downside of extracting out the initialize_inner_setup is that my initialization routine no longer reads top to bottom. You need to go look at initialize_inner_setup, which should only be a few lines that would've been better off at the top of the fn initialize.

That said, when given a better name this hypothetical initialize_inner_setup would be named clearly enough that you wouldn't need to go read it to it to understand. Something like fn specific_thing_that_is_happening() {}

None of this is a big deal for my own codebase since this pattern only occurs once, just illustrating the trade-off.

In practice it would be fine for me to generate the entire initialize function in a single macro invocation since it barely has any code in it. And if it had a lot of code I would want to move parts of it into other sub-functions like the initialize_inner_setup example.

So generating everything in a single macro invocation would be okay in my case.

rust-lang / rust