rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.46k stars 12.6k forks source link

Tracking issue for lifetime elision for impl headers (feature impl_header_lifetime_elision) #15872

Closed aturon closed 5 years ago

aturon commented 10 years ago

The lifetime elision RFC included rules for eliding lifetimes in impl headers, but these rules were not included in the implementation.

Mentoring instructions can be found here.

pnkfelix commented 9 years ago

triage: P-low

nikomatsakis commented 8 years ago

See here for a summary of the current status.

sophiajt commented 8 years ago

Should this be P-low since @brson notes it as a critical part of the implementation?

aturon commented 8 years ago

@jonathandturner I think there's some confusion -- @brson was referring to error messages, which I believe at this point are largely as specified (but would be worth checking). This issue, OTOH, is about adding an additional point of elision.

I'm nominating this for lang team discussion -- the RFC for this feature is quite old, and at this point I think we should revisit whether we even want to do it.

nikomatsakis commented 8 years ago

We discussed in the @rust-lang/lang meeting today and decided that there is no reason to expire an RFC just because it's been a while. Basically, the motivation of the RFC still seems fairly relevant. That said, we would expect that if these changes were implemented, they would go through a stabilization period, and so if people felt like they were now bad, we could address that then.

It's worth noting here for the record a few random thoughts of my own. First, I feel like elision has been wonderful except for one case: fn foo(&self) -> Bar where Bar has lifetime parameters (struct Bar<'x>). In this case, I feel like the fact that Bar "extends" the borrow of self is subtle. This becomes worse when there are other named lifetime parameters that appear in the return type, for example fn foo(&self) -> &'a Bar, which would expand to fn foo<'b>(&'b self) -> &'a Foo<'b>. I'd love if we could find some way to lint against harmful patterns and provide a lightweight way to indicate that there are lifetimes in Bar without having to name them.

djc commented 7 years ago

I might be interested in working on this. Can someone outline some steps to get going? What's the perceived complexity factor here? (I have contributed small things before, but no actual compiler work.)

eddyb commented 7 years ago

@djc You might want to start by taking a look at #39265, which stabilized 'static elision in const & static (by removing the special casing it needed for feature-gating), and which touched the new version of the elision system in resolve_lifetime, hopefully the site of most of your changes.

I worry that impl headers would want early-bound lifetime parameters so you'd need to synthetize those in a way that librustc_typeck/collect.rs understands and keeps track of in the impl's ty::Generics.

It still wouldn't be that hard, as impl lifetime/type parameters are never exposed to the user (instead, they're inferred from Self and trait parameters, where applicable), but it's unlike other forms of elision.

nikomatsakis commented 7 years ago

I'm definitely keen on seeing this done. @djc if you want I can try to draw up somewhat more detailed instructions that what @eddyb provided, but let me know.

djc commented 7 years ago

@nikomatsakis that would be quite welcome. I did look at lifetime_resolvers and typeck::collect a little, but a strategy to get started did not jump out at me. One thing I was wondering is if I could somehow print HIR before and after the pass that should the elision, to give me some way to debug.

eddyb commented 7 years ago

@djc the HIR is not mutated, extra information (associated by NodeIds from the HIR) is collected instead.

djc commented 7 years ago

@nikomatsakis ping, would you still have a chance to draw up more detailed instructions?

nikomatsakis commented 7 years ago

@djc I haven't. :( I'll do some poking about now and see how easy I think it would be.

nikomatsakis commented 7 years ago

@aturon @wycats

I have been looking into how to implicit impl elision, and I wanted to clarify one part of the RFC. The text says this (emphasis mine):

For impl headers, input refers to the lifetimes appears in the type receiving the impl, while output refers to the trait, if any. So impl<'a> Foo<'a> has 'a in input position, while impl<'a, 'b, 'c> SomeTrait<'b, 'c> for Foo<'a, 'c> has 'a in input position, 'b in output position, and 'c in both input and output positions.

In general, though, there can be more than one type in an impl header, and the text doesn't seem to clearly address this case. (Neither do the examples.) This is how I would interpret the rules:

One question is whether to extend the "self is privileged" rule to impls. I'd be inclined to say no, even though I think it might make sense, in the absence of data, simply because it's the conservative choice. However, there is an even more conservative choice, which is to require explicit lifetimes everywhere but Self. I don't see much of a point to that.

Therefore:

impl Eq<&str> for &str // becomes:
impl<'a, 'b> Eq<&'a str> for &'b str

trait Foo<'x, T> { ... }

impl Foo<&str> for &str // error: output position `'x` has multiple possibilities

impl Foo<u8> for &str // becomes:
impl<'a> Foo<'a, u8> for &'a str

impl Foo<&str> for u8 // becomes:
impl<'a> Foo<'a, &'a str> for u8
nikomatsakis commented 7 years ago

Another question: in the "Elision 2.0 RFC" draft that @solson and I have been (slowly) working on, we are planning to deprecate "implicit" lifetimes (e.g., Ref<T> being expanded to Ref<'a ,T>), at least in some scenarios (I rather liked @aturon's proposal of deprecating an implicit lifetime only if it is used in an output position as well, and thus encouraging one to signal it by writing Ref<', T> in that case, though I also might not object to just deprecating them altogether). Should we omit such cases from impl headers and just focus on &T? Or implement the rules more evenly and then deprecate evenly?

nikomatsakis commented 7 years ago

OK @djc, so I did some research into how to implement this. As you can see from my previous comment, there are some vaguaries in the RFC text that I think we should clear up, but it doesn't really affect the implementation very much. The good news is that the heroic @eddyb has done some nice refactoring here that I think makes this job much nicer than it once was.

A quick outline of how the various bits in the compiler work that are relevant here. First off, let's take two example impls to show what I'm talking about:

impl Foo for &str { .. }
impl Foo for Ref<u8> { .. } // Ref is defined as `struct Ref<'a, T>`

OK, so, the general flow of the compiler is like this:

So, to make impl elision work, we are going to have to modify a little bit of each of these places. In particular, we are going to be inserted new "automatic" lifetime parameters, so those will have to appear in the Generics structure. The types which reference those automatic parameters will need to be correct, though probably we'll achieve that mostly by making sure that the resolve_lifetimes pass links up the lifetimes correctly (we may not need to modify the code for types in astconv at all).

OK, let's dive into a bit more detail. Let's walk through how elided lifetimes work today. In the AST, elided lifetimes are simply not present. So if you check out the enum variant for &ptr, called Rptr, you see that it has an Option for a lifetime. In the case of Ref, they are "even more" elided, in that there will simply be a path with zero lifetime parameters. However, in the HIR, all lifetime parameters are always present. So, the corresponding enum variant in the HIR does not contain an option:

 TyRptr(Lifetime, MutTy),

Similarly, although you can't see this in their definition, paths like Ref<T> where no lifetime was provided will have the elided lifetimes present in the HIR. The way this works is that, during lowering, we insert a sentinel lifetime indicating that the lifetime was elided by the user. For example, here is the code that converts &T types from the AST to the HIR: you can see that if the AST has None for the lifetime, the HIR gets the sentinel value.

This is handy because every hir::Lifetime struct has a unique id. This id will be used by resolve_lifetimes to record, for each hir::Lifetime, what that lifetime actually refers to (this is not done on the AST, like paths, but rather on the HIR). These are stored in a map called the NamedRegionMap, in the defs field. This is a NodeMap, which is a hashmap keyed by NodeId (the type of the id field from Lifetime). The value is a resolve_lifetime::Region struct -- somewhat confusingly, the resolve_lifetime code has its own Region struct (there is another in ty).

OK, let's take a break and walk through one of our examples now. We'll look at the first impl, the one for &str. Actually, we'll look at a variant, with a named lifetime parameter (this is kind of what we want to desugar to):

impl<'a> Foo for &'a str { .. }

What happens to this impl as we proceed through the passes? In the AST, the type of &str would be a Rptr with Some('a) for the lifetime. During lowering, this 'a gets converted into a hir::Lifetime struct with basically the same information. In the resolve-lifetimes code, we will walk down the HIR. As we get to the impl, we will bring the lifetimes declared on it into scope. In this case, that is the lifetime 'a. We map 'a to the resolve_lifetime::Region::Early variant (see the note below about the name "early"). Then later, when resolve_lifetime reaches the type &'a str, it will search these scopes and store the result into the map. This basically just stores the index of this lifetime parameter in the list of generics (in this case, I think it will be 0, as its the only parameter on the impl).

Later on, during collect, we have to create the generics for the impl. This will read the fields from the HIR, iterate over each lifetime in there, and create a RegionParameterDef for each one. This eventually gets stored in the Generics structure.

When we convert the &'a str type into the compiler's internal representation, meanwhile, it will lookup the result from the NamedRegionMap that resolve_lifetime created. It will find the Early variant and create a ty::Region with the given index.

OK, that's all I have time for for now. I haven't actually gotten into how to DO the change, but hopefully this will help you get acquanted with how the relevant code works now and maybe gives you an inkling how we will update it.

eddyb commented 7 years ago

If you treat it like argument elision in rustc::middle::resolve_lifetime, but just slightly different so you can tell between the lifetimes added in an impl header (always "early") and a fn signature (always "late"), then you can use a map (or just traverse the HIR) to add those lifetimes in typeck::collect to ty::Generics.

Most of the code out there shouldn't mind, i.e. it should assume it can't know what ty::Generics looks like for a given HIR without asking for ty::Generics specifically.

djc commented 7 years ago

@nikomatsakis thanks a lot for the detailed explanation. I'll be diving in soon to see if I can make sense of it all.

nikomatsakis commented 7 years ago

@djc I'll try to write up the next few steps soon -- but I figured I'd given you enough to chew on to start. =)

qmx commented 6 years ago

heads up, I'm working on this one

nikomatsakis commented 6 years ago

OK, I've written up a fairly comprehensive set of examples and test cases in this gist. The rules I settled on are as follows:

nikomatsakis commented 6 years ago

OK, let me make another effort at writing up some mentoring instructions. I think there are going to be a few big steps. I think the best place to get started is to focus on the case of elided lifetimes that appear in the types of the impl header. For example, impl Foo for &str or impl Foo for Bar<'_>. Just to make things even simpler, let's do it only for impls with no items -- or at least for impls with no methods. This will allow us to skip over some annoying details at first (see below).

I'm a bit torn on the best way to do all of this. Here is one possible path, it may not be the absolute best. Originally, I was going to recommend that we handle a lot of this in the resolve-lifetimes code, building on the existing mechanisms around elided lifetimes. But now I am not sure if that's the best path. In particular, we are going to want to be synthesizing early-bound region parameters, and that will require creating def-ids. It seems like all of this will work more smoothly if we handle it during HIR lowering.

The idea would be to follow a not dissimilar path from how we are handling impl Trait in argument position. We'll need to thread around (perhaps using a field, in part to avoid conflicts with that branch, and also because this is not going to "switch back and forth" very often) a bit of context I would call the "lifetime elision mode". This roughly indicates "what to do when you find a lifetime that is missing". Right now we invoke self.elided_lifetime() and it generates a hir::Lifetime with a special name. Later on, during lifetime resolution, we may report an error if an elided lifetime appears in the wrong place.

What I think we want to do is to add a field to the resolver that tracks the "lifetime elision mode". This mode could be one of two values (for now, anyway):

We will normally have the mode set to Defer. But when we are lowering the trait ref in the impl header and the self type in the impl header, we will set it to ImplHeader with an (initially) empty vector.

This mode will cause elided_lifetime() to behave differently: it will create a new DefId for this region, create a RegionParameterDef, and store RegionParameterDef into the vector. (The indices should take into account the existing parameters declared on the impl, both region and type). It will then generate a new kind of hir::Lifetime that references this RegionParameterDef. I imagine we can add a new variant to hir::LifetimeName, something like Anonymous(usize), where usize is the index we assigned to the region. This will be understood by the resolve_lifetimes code and mapped to an early-bound region.

When we are done with that part of lowering, we can extract the vector and then have a list of the synthetic region parameters that we need, which can then be added to the Generics struct.

OK, that's the high-level idea, but I just realized in writing it that there are some complications. For one thing, I think that some amount of the code today assumes that the all lifetime parameters in a particular scope come before the type parameters, but I think we are going to need to have [L,T,L] for the impl. One option is to refactor ty::Generics so that, instead of having two lists of generics and types, it has one unified list of ParameterDef which is something like:

enum ParameterDef<'tcx> {
  Type(TypeParameterDef), Region(RegionParameterDef)
}

We could then simplify the various bits of logic that are adjusting the indices into these vectors (e.g., this).

Another option is that the generics_of for an impl would always yield a "two-step", where it has

Currently, the second step doesn't exist. This would preserve the current structure, but it would mess up the fact that -- right now -- the parent of a generics is stored as a DefId, and we would have to gin up a false def-id to represent the actual types and lifetimes from the impl.

To me, the first path seems a bit better, but this seems like a try-it-and-see sort of thing.

nikomatsakis commented 6 years ago

One thing to be careful of: when making a trait object (here and here), I think we want to use the existing elision code. Object lifetime default elision works somewhat differently and we don't want to disturb that. So it probably wants to "flip" the mode back to Defer temporarily or something.

nikomatsakis commented 6 years ago

Update: @qmx and I chatted. I realized that I was mistaken about needing to refactor Generics to support more flexible lifetime patterns. In particular, it seems that hir::Generics do not assign indices, that only happens during the query step, so it's fine if we just insert the synthetic lifetime parameters into the list at arbitrary places.

As a very first step, @qmx is going to work on instrumenting the HIR lowering just to detect when '_ is used in impl headers and report different errors based on whether it is used in a place where it will eventually be supported.

nikomatsakis commented 6 years ago

Update: I've started hacking on this myself. Building on the existing in-band lifetimes work that @cramertj did, it's not very hard. I think I'm close to a PR.

mbrubeck commented 6 years ago

Note for anyone trying to use this feature: It's currently available on the nightly channel with #![feature(in_band_lifetimes)]. (And #![feature(underscore_lifetimes)] is also needed for some use cases.)

The tracking issue for stabilization is #44524.

scottmcm commented 6 years ago

Reactivating to serve as a tracking issue, now that this has been split out into its own feature gate.

scottmcm commented 6 years ago

@rfcbot fcp merge

This is a seemed-well-received subset of the P-FCP in https://github.com/rust-lang/rust/issues/44524#issuecomment-405763630. :bell: If you haven't tried out this feature and left feedback, now is the time to do so :bell: (It will take some time for the team members to review the motion to FCP, and then there will be some time for the comment period to complete.)

Two simple examples where this avoids needing named lifetimes:

trait MyIterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

impl<I: MyIterator> MyIterator for &mut I { // No 'a
    type Item = I::Item;
    fn next(&mut self) -> Option<Self::Item> {
        (**self).next()
    }
}
struct SetOnDrop<'a, T: 'a> {
    borrow: &'a mut T,
    value: Option<T>,
}

impl<T> Drop for SetOnDrop<'_, T> { // No 'a
    fn drop(&mut self) {
        if let Some(x) = self.value.take() {
            *self.borrow = x;
        }
    }
}

This is on-by-default in the 2018 edition, and for 2015 is available as of the 2018-08-13 nightly under

#![feature(impl_header_lifetime_elision)]

Edit: Additional details below in https://github.com/rust-lang/rust/issues/15872#issuecomment-413080939

rfcbot commented 6 years ago

Team member @scottmcm has proposed to merge this. The next step is review by the rest of the tagged teams:

Concerns:

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

nrc commented 6 years ago

This is a seemed-well-received subset of the P-FCP in #44524 (comment)

@scottmcm could you specify exactly what is being FCP'ed here please?

Centril commented 6 years ago

@nrc AIUI just implicit universal quantification of lifetimes for elided lifetime positions in impl headers (&Type, Type<'_>) but not in-band ones in headers and functions, so no (&'tcx Type, Type<'tcx>).

scottmcm commented 6 years ago

@nrc The parts of https://github.com/rust-lang/rfcs/pull/141 that relate to impl headers, adapted to current use of '_.

So, specifically, these examples:

impl Reader for BufReader<'_> { ... }                   // elided
impl<'a> Reader for BufReader<'a> { .. }                // expanded

impl Reader for (&str, &str) { ... }                    // elided
impl<'a, 'b> Reader for (&'a str, &'b str) { ... }      // expanded

The rules being:

AKA all the lifetimes in the impl header are "input lifetimes". (IMHO the place in impls for output lifetimes will be in associated type elision, but that's not part of the RFC nor the FCP.)

Note, thus, that the following example from the RFC is not included:

impl StrSlice for &str { ... }                          // elided
impl<'a> StrSlice<'a> for &'a str { ... }               // expanded

As it would need to be StrSlice<'_>, and the expansion would actually be

impl StrSlice<'_> for &str { ... }                      // elided
impl<'a, 'b> StrSlice<'a> for &'b str { ... }           // expanded
aturon commented 6 years ago

@rfcbot concert input-output

@scottmcm can you say more about the rationale for making all positions input positions? IIRC with the original RFC we did some amount of usage analysis to inform the input/output distinction, and the feeling was that lifetime parameters in traits were most often connected (as outputs) to those in types. At an intuitive level, this makes sense to be because if a trait is tied to some lifetime, it generally implies the presence of borrowing within the structures that implement the trait.

Centril commented 6 years ago

@aturon that usage analysis seems reasonable to me. However, while this is probably vague... I would not expect traits to count as output positions because they do not feel like outputs while associated types do.

At an intuitive level, this makes sense to be because if a trait is tied to some lifetime, it generally implies the presence of borrowing within the structures that implement the trait.

I think that makes sense.

scottmcm commented 6 years ago

@nikomatsakis As the author of https://github.com/rust-lang/rust/pull/49251, can you confirm that my description of the rules match what's implemented? Though I'm pretty sure it does based on https://play.rust-lang.org/?gist=de64967030a2a17a21e58b6209d1b834&version=nightly&mode=debug&edition=2018

@aturon Yes, this is what I was asking about on Discord -- I hadn't actually looked at what the RFC said it should be doing until writing https://github.com/rust-lang/rust/issues/15872#issuecomment-413080939, as it wasn't needed to do the feature gate PR :slightly_smiling_face:

I think the important part is that it's not "all positions [are] input positions", but that parameters are input position. This is the same as functions, where all parameters are input positions, but not all positions are input positions: the return type, which is not a parameter, is output position. "Parameters are controlled by the caller, so are input position" is true for both functions and impls. I think the correct intuition is that impl Foo<A> for B is not like fn Foo(a: A)->B, but like fn Foo(self: B, a: A).

But that leaves an obvious question: What is output position for impls? Answer: Associated types.

This is most blatant for the operator traits. That this, for example:

impl Add<&i32> for &i32 {
    type Output = i32;
    ...
}

How do we want that elided? Exactly like fn(&i32, &i32)->i32, and not like fn(&i32) -> (&i32, i32). This also holds true for the other two impls using &s:

impl Add<i32> for &i32 { // clearly input position, fine under either model
impl Add<&i32> for i32 { // cannot elide if trait arguments are output position

The RHS is a type parameter because we want it to be an input when it comes to trait resolution; that's why it's a type parameter and not an associated type. The same is true for elision.

But of course that's an associated type with no lifetime. What it the associated type needs one? Well, there are a bunch of those because of the normal pattern with IntoIterator:

https://github.com/rust-lang/rust/blob/1558ae7cfd5e1190d3388dcc6f0f734589e4e478/src/liballoc/vec.rs#L1818-L1820

Saying that associated types are output position (in a future RFC) would simplify that pattern:

impl<T> IntoIterator for &Vec<T> { 
    type Item = &T; 
    type IntoIter = slice::Iter<'_, T>; 

That continues to fit the analogy to fn: that impl would be fn(&Vec<T>) -> (&T, Iter<'_, T>).

But are trait parameters being input helpful for this, you ask? Well, see things like

https://github.com/rust-lang/rust/blob/1558ae7cfd5e1190d3388dcc6f0f734589e4e478/src/libcore/str/pattern.rs#L431-L433

That's definitely an input-ish lifetime on the trait that could be elided for the associated type, as it's like fn(char, X<'_>) -> Searcher<'_>: one input with a lifetime, so the output gets it via elision. (Disclaimer: this example isn't perfect, since the lifetime is also needed in methods, unlike IntoIterator.)

So while there are certainly situations where trait parameters being output position in the context of elision would be helpful, I think it's inconsistent with how we treat trait parameters in things like resolution, it's surprising or maybe harmful when the parameter come from &T (instead of the lifetime being directly on the trait), and associated types being output position is a better choice overall.

rfcbot commented 6 years ago

:bell: This is now entering its final comment period, as per the review above. :bell:

nikomatsakis commented 6 years ago

@aturon

can you say more about the rationale for making all positions input positions?

I can confirm @scottmcm's interpretation. @aturon, I did indeed take a different approach than what the RFC seemed to suggest (though the RFC text, iirc, was also sort of ambiguous). I left a few comments along the way (e.g., a and b), but let me summarize:

So, those two things combined led me to pursue the path as currently implemented. I would be very interested however to hear if people have found this behavior surprising or found that -- in their libraries -- an alternative behavior would be preferable.

Centril commented 6 years ago

To make sure that @aturon's concern doesn't go unregistered...

@rfcbot concern input-output

scottmcm commented 6 years ago

I believe the RFC may have pre-dated associated types?

Good catch, @nikomatsakis! The RFC was merged 2014-07-09; in 2014-09 Add was still

https://github.com/rust-lang/rust/blob/98fff20a7c95bdbb32652a95fd3211f743218b46/src/libcore/ops.rs#L114-L118

Another thing I noticed: eliding a lifetime parameter to a trait is often useless, since it needs to be mentioned later in a method signature. For example, rustc::mir::visit::Visitor<'tcx> could sometimes get its lifetime from the Self type (though even here there are more places where it couldn't, as the types have multiple or zero lifetime parameters), but even where that could work, it wouldn't be used as the fns need it:

https://github.com/rust-lang/rust/blob/9f9f2c0095cd683b94adca133f2733aa1f88bb19/src/librustc_mir/transform/promote_consts.rs#L86-L90

(Unless, like was discussed for in-band, there was some "higher-level" convention for lifetimes so the fn could reference PlaceContext<''_> or something, but that doesn't sound great to me.)

Centril commented 6 years ago

On behalf of @aturon:

@rfcbot resolve input-output

rfcbot commented 6 years ago

:bell: This is now entering its final comment period, as per the review above. :bell:

nikomatsakis commented 6 years ago

One more point that I forgot to add. I thought that the elision rules as originally written would require you to sometimes use disjoint names just to force disjointness, which is weird. e.g. you might need to write this:

impl<'a, 'b> PartialEq<&'a Foo> for &'b Foo { }

to permit two references with distinct lifetimes to be used.

scottmcm commented 6 years ago

To add to niko's comment: ...and we want to lint on single-use lifetimes like that.

rfcbot commented 6 years ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

steveklabnik commented 6 years ago

What's the current status here? @rfcbot says we're good to go. It seems this is already stable as part of edition 2018. What's left to do?

Centril commented 6 years ago

@steveklabnik this needs to be stabilized by someone and then I think the reference and docs need to be updated.

steveklabnik commented 6 years ago

What does that stabilization effect, given that it’s already stable in 2018? For 2015?

On Sep 20, 2018, at 2:09 PM, Mazdak Farrokhzad notifications@github.com wrote:

@steveklabnik this needs to be stabilized by someone and then I think the reference and docs need to be updated.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Centril commented 6 years ago

@steveklabnik yes, there's no reason AFAIK that this should not be stable in 2015 because there is no breakage.

steveklabnik commented 6 years ago

Great, just trying to understand exactly what's up.

Centril commented 6 years ago

@steveklabnik you and me both ;)

scottmcm commented 6 years ago

I have a PR up for the issue I found with this (https://github.com/rust-lang/rust/pull/54458); should be good for a stabilization PR.

Edit: Started one at https://github.com/rust-lang/rust/pull/54778