RFC: Limited return type inference constrained to a trait

glaebhoerl commented 10 years ago

Problem

Right now if you want to write a function that transforms an iterator you have to specify the type explicitly:

(1.)

struct MyTransformer<Iter> { ... }
impl<T, Iter: Iterator<T>> Iterator<T> for MyTransformer<Iter> { ... }
fn my_transform<T, Iter: Iterator<T>>(iter: Iter) -> MyTransformer<Iter> { ... }

If you want to hide the type of the iterator, you can use a trait object:

(2.)

fn my_transform<T, Iter: Iterator<T>>(iter: Iter) -> ~Iterator<T> { ... }

but this has a performance cost: heap allocation and indirection.

One solution is return type inference:

(3.)

fn my_transform<T, Iter: Iterator<T>>(iter: Iter) -> _ { ... }

This is what C++14 adopts. It has several drawbacks:

It breaks encapsulation. The signature of the function (the interface) now depends on the body of the function (the implementation).
If you're a client of this function and want to know what you can do with the return type, you either have to rely on external documentation (comments) or look at the implementation.
In Rust's case, it significantly complicates the type checking algorithm.

Rust's policy of no type inference at the API level is a sound one, in my opinion.

The same problem will recur when we add unboxed closures.

Proposed solution

Allow writing:

(4.)

fn my_transform<T, Iter: Iterator<T>>(iter: Iter) -> Iterator<T> { ... }

This syntax, a trait as the return type of a function, indicates a limited form of return type inference. Clients of the function may only use the return type according to the trait. The return type of the function must infer to a type that implements the trait.

This is sort of like an existential, but not quite, which is why I think thinking of it as return type inference is more accurate. If it were a true existential, the function could have several branches and return a different type, each implementing Iterator, from each branch. This is OK with trait objects, but clearly not representable in an unboxed form. Similarly, from a caller's perspective, if it were a true existential, two calls to the function could not be assumed to return the same type. But here it's quite reasonable to assume that they do.

In the interest of least surprise, the return type of such a function with an inferred return type should be assumed to be equal only to itself:

trait IsThing { }
struct Thing;
impl IsThing for Thing { }

fn some_thing() -> IsThing { Thing }
fn some_other_thing() -> IsThing { Thing }

let mut my_thing = some_thing();

// we expect this to work
my_thing = some_thing();

// we expect this *not* to work
my_thing = some_other_thing();
// because it would be surprising if the compiler "knew" that
// `some_thing` and `some_other_thing` return the same type
// when this isn't present in their signatures

In other words, when typechecking the rest of the program, for each function with such an inferred return type, the compiler would conjure a new anonymous type which it would pretend is the return type of the function, knowing only that it implements the trait. The true identity of this type (and the impl on it) would be determined when typechecking the function itself. In this way it should be possible to typecheck the body of the function and the rest of the program separately. (Edit: I think we would have to ban using this feature on trait methods though.)

This solution seems superior to the other three (explicit return type, trait object, or unlimited return type inference), especially if you also consider wanting to return a lambda when we have unboxed closures.

emberian commented 10 years ago

I like this proposal but I think I would want more syntax for the return type. It's impossible to tell a trait apart from any other type just by looking at it, so it might get confusing. I don't have any particularly good ideas around this. The first that comes to mind is fn foo() => Iterator<T>

glaebhoerl commented 10 years ago

Yeah, that occurred to me too. There's no other legal meaning a trait in that position could possibly have, so it's not ambiguous in that sense, and it is very similar in terms of behaviour to a trait object (even if it's not quite the same thing), where the same concern applies and we don't seem to mind so much, so I'm not sure it's that bad.

I could also see having different syntax though. My usual tactic in these cases is to take a look at the list of Rust keywords: in this case the two that look like they might have any kind of potential are as and trait. => isn't bad either, though we don't use that syntax for function-like things anywhere else. Another option I thought of which gets the meaning across but I think is a bit ugly is -> _: Trait.

So the options (so far) are:

fn my_iter() -> Iterator<T> { ... }
fn my_iter() => Iterator<T> { ... }
fn my_iter() -> as Iterator<T> { ... }
fn my_iter() -> trait Iterator<T> { ... }
fn my_iter() -> _: Iterator<T> { ... }

I still like 1., but I'm not overly attached to it. Edit: actually, assuming of course that we implement this at all, maybe we should do 5., which is more precise and explicit, and reserve the option to introduce 1. as sugar for it at some later point. Edit again: yeah, I guess I'm basically in favor of 5. now. Conveying the meaning accurately is more important than appealing syntax.

emberian commented 10 years ago

(It's not ambiguous, but if we have separate typing rules for the return value of one of these functions, I don't really want it using the same syntax.)

On Sat, Jan 11, 2014 at 11:45 AM, Gábor Lehel notifications@github.comwrote:

Yeah, that occurred to me too. There's no other legal meaning a trait in that position could possibly have, so it's not ambiguous in that sense, and it is very similar in terms of behaviour to a trait object (even if it's not quite the same thing), where the same concern applies and we don't seem to mind so much, so I'm not sure it's that bad.

I could also see having different syntax though. My usual tactic in these cases is to take a look at the list of Rust keywords: in this case the two that look like they might have any kind of potential are as and trait. =>isn't bad either, though we don't use that syntax for function-like things anywhere else. Another option I thought of which gets the meaning across but I think is a bit ugly is -> _: Trait.

So the options (so far) are:

fn my_iter() -> Iterator { ... }

fn my_iter() => Iterator { ... }

fn my_iter() -> as Iterator { ... }

fn my_iter() -> trait Iterator { ... }

fn myiter() -> : Iterator { ... }

I still like 1., but I'm not overly attached to it.

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla/rust/issues/11455#issuecomment-32100571 .

emberian commented 10 years ago

cc @nikomatsakis @pcwalton @brson

alexcrichton commented 10 years ago

I would personally not be in favor of this. I find it incredibly important that all typechecking and borrowchecking related problems are all completely local to a function. I as a programmer can very easily know exactly what's going in because I have a very small world that I need to analyze (just this function).

If you don't explicitly name the return type, what if it has lifetime parameters that I need to know about? What if I then need to pass it to another function? I agree that it's really annoying trying to type out all the iterator nested traits when returning one, but I really don't want to break the mental model of completely local inference.

emberian commented 10 years ago

That's a good point, I hadn't considered lifetime parameters.

On Sat, Jan 11, 2014 at 1:11 PM, Alex Crichton notifications@github.comwrote:

I would personally not be in favor of this. I find it incredibly important that all typechecking and borrowchecking related problems are all completely local to a function. I as a programmer can very easily know exactly what's going in because I have a very small world that I need to analyze (just this function).

If you don't explicitly name the return type, what if it has lifetime parameters that I need to know about? What if I then need to pass it to another function? I agree that it's really annoying trying to type out all the iterator nested traits when returning one, but I really don't want to break the mental model of completely local inference.

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla/rust/issues/11455#issuecomment-32102799 .

glaebhoerl commented 10 years ago

@alexcrichton I think you're partly misunderstanding how it would work (but the issue you raise wrt lifetimes is still very valid). The caller of a function with an inferred return type would not know anything about the type that is returned (and hence would not be able to depend on it), except that it impls the trait. They would only see it as "some anonymous type".

Therefore

What if I then need to pass it to another function?

you would only be able to pass it to another function if it's a generic one that accepts any I: Iterator<T> (or of course a completely generic function without even the Iterator bound).

If you don't explicitly name the return type, what if it has lifetime parameters that I need to know about?

This one is hurting my brain. If I'm thinking straight, substituting a dummy/placeholder/anonymous type for a type that has lifetime parameters would break type safety. Consider for example:

fn this_is_evil<T, Iter: Iterator<T>>(iter: Iter) -> _: Iterator<T> {
    iter
}

fn this_is_bad() -> ~Iterator<int>{
    // lifetime is this function
    let a = &[1, 2, 3];

    // the return type is an unknown type that impls `Iterator`,
    // so this is accepted, and now the trait object we return
    // points into invalid memory
    ~this_is_evil(a.iter().map(|i| *i))
    // at entry to `this_is_evil` the lifetime information is present,
    // but when it comes back out it is lost
}

Just as with trait objects, we would have to forbid erasing lifetime information in this way. I'm not sure how significantly this would impact the usefulness of the feature.

@alexcrichton Does this address your concerns? Do you see any other problems?

alexcrichton commented 10 years ago

This is still very different from returning a trait. For example if I return a PeekIterator then my code will compile but should the compiler allow me to call peek? The return type said nothing about a peek iterator, so are we also going to erase all extra methods that aren't on the trait? It seems to me like both decisions here (yes/no) are unexpected.

I'm still very concerned about this and its effects on a programmer's mental model. I can come up with a few cases here and there about where I personally would be confused, but this will always come back to the basic problem that reasoning about a function is no longer local to the function itself.

I don't think that a feature like this is impossible, but we would need to very carefully approach it. Another idea off the top of my head is fn foo<T: Iterator<int>>() -> T (I don't think that's allowed today), but I'm still afraid about the cognitive burden of not understanding the return value.

glaebhoerl commented 10 years ago

With regards to the lifetimes issue, trying out my previous example, except using a trait object:

fn evil<T, Iter: Iterator<T>>(iter: Iter) -> ~Iterator:<T> {
    ~iter as ~Iterator:<T>
}

I get the error: error: value may contain borrowed pointers; add 'static bound, so I think this is the same thing that we would have to do for this proposal.

For example if I return a PeekIterator then my code will compile but should the compiler allow me to call peek?

No.

The return type said nothing about a peek iterator, so are we also going to erase all extra methods that aren't on the trait?

Yes. Any information you want to convey to the outside world has to be in the signature. If you want the rest of the program to know that it's a PeekIterator, then you need to write -> PeekIterator.

It seems to me like both decisions here (yes/no) are unexpected.

Not to me. The same thing happens with trait objects. Why is this more surprising?

I'm still very concerned about this and its effects on a programmer's mental model.

It shouldn't have any more effect than trait objects do. Just like with trait objects, you get an unknown type that you can call trait methods on. The only differences from a type system perspective are that (a) the caller can assume that the same type is returned every time (while not knowing anything about the type) (whereas with trait objects, the type "inside" can be different each time), and correspondingly (b) all returns from the function have to infer to the same type.

Another idea off the top of my head is fn foo<T: Iterator<int>>() -> T (I don't think that's allowed today)

You can write that today, but the meaning is very different. That type says that you have to be able to return any iterator type of the caller's choice. In other words, you won't be able to implement it. What we want is a return type chosen by the callee, with the details of what type that is (apart from what is present in the signature, i.e. the trait) being hidden from the caller.

EDIT: Basically these concerns about non-local reasoning are exactly what this proposal was designed to address.

alexcrichton commented 10 years ago

Cool, so from what you said this sounds more palatable to me. I remain concerned about the idea, however, because this sounds exactly like a trait object, except that you're able to return one by value. From a compiler standpoint I guess this will infer what the return value actually is, but why won't we start inferring these arguments everywhere else? If you can do this for return values, why not for argument values or struct values?

I fear that this is addressing a very specific problem without addressing the problem at large. I don't think that DST will fix this, but perhaps it will?

I'm sorry if I sound so negative here, I don't want to deter an awesome feature while it's brewing! I'm just concerned about adding a very specific feature to the language instead of "getting our bang for our buck" from some other feature.

glaebhoerl commented 10 years ago

If you can do this for return values, why not for argument values or struct values?

Well, let's try looking at it case by case. For functions we can break it down on two axes: argument vs. return, in other words caller provides value vs. callee provides value, and caller chooses type vs. callee chooses type.

Caller provides value, caller chooses type: this is generics as we know it (or parametric polymorphism, universal quantification in other contexts). The caller provides a value of any type, which is abstract to the callee except for the traits that it implements.
Caller provides value, callee chooses type: in other words, the caller passes in a polymorphic value which the callee may instantiate at any types they choose. This is what's known as higher-rank types, or what would be "pointer to a function template" in C++ (and isn't possible). It's not currently possible in Rust either. Let's try applying the same trick as with return type inference: given a function fn foo(iter: Iterator<int>), the callee may instantiate iter at any single type that implements Iterator<int>. This type is inferred, and substituted back into the signature (but unknown to the caller). The caller will only be able to provide an argument for this parameter, where they are expected to provide a value of a type which is unknown to them, if they have access to a polymorphic value, or a function with a completely polymorphic output (as in the bullet point below this one). These are few and far between, but if we add polymorphic numeric literals like Haskell has, that would be one example. So I guess this isn't a totally unreasonable idea, but I don't see much utility in it right now (what problem it would solve).
Callee provides value, caller chooses type: this is generics again, except this time applied to the return type (the fn foo<T: Iterator<int>>() -> T example from earlier).
Callee provides value, callee chooses type: this is the proposal under discussion. The callee provides a value of any type, which is abstract to the caller except for the traits that it implements (so the inverse of the first point).

Struct values: in other words, this would be struct Foo { a: int, b: _ } where the person creating the struct chooses the type for b. But then Foo doesn't have a fixed representation. Applying our "usual trick" would imply something like all sites which create a Foo having to infer to the same type for b, but that's pretty crazy. So I don't think this makes any sense. The case where Foo can be represented because it's hidden behind a pointer is the sort of thing that DST handles.

I fear that this is addressing a very specific problem without addressing the problem at large.

What would you say is "the problem at large"?

I don't think that DST will fix this, but perhaps it will?

When I first started thinking about this, I thought it might be closely related to DST as well (after all it's sort of like an unboxed existential, while DST deals with boxed existentials), but after thinking about it more, I reached the conclusion that the two don't actually have much to do with each other, and that this is better thought of as (restricted) return type inference (for instance, because unboxed existentials are impossible, and return type inference is possible).

I'm sorry if I sound so negative here, I don't want to deter an awesome feature while it's brewing! I'm just concerned about adding a very specific feature to the language instead of "getting our bang for our buck" from some other feature.

No problem at all, or rather: thank you. You've already pointed out one very important issue (lifetimes), and made me consider another possibility I hadn't before (the analogous case for function arguments, from above), even if I don't think there's much of a use case for it at present. And "generalize, generalize, generalize" is basically my mantra as well, so we're in agreement there. I don't see any general thing at the moment that this feature could be a specific case of. The only axis I can think of to generalize it on would be the potential to return something where an inner type is inferred, e.g. foo(...) -> Option<Iterator<T>>, and I did think of that, but the bang-for-buck ratio seems considerably lower, and I'm not sure if it doesn't raise other issues (and in any case, the mechanism could be extended later if there's demand).

I think we might be on the same page in the end? That trait-restricted return type inference is preferable to having to use trait objects (because performance), and preferable to unrestricted return type inference (because local reasoning), and preferable to always having to write out the type (because unboxed closures won't have one). And that if there's some more general mechanism, then that would be even more preferable: I just don't happen to know of any.

(And apologies if I might be a little too comprehensive in my responses!)

alexcrichton commented 10 years ago

So as I read this it's getting more and more palatable to me. I'd certainly want to discuss this in at least a meeting if not a larger forum. It doesn't make sense to me to store a value like this in structs or things like that, but your Option<Iterator<T>> is a good example of a snag we could hit.

glaebhoerl commented 10 years ago

Sure, there's no rush. It's a backwards compatible feature and we don't even have unboxed closures yet (and it's not like I have a PR ready to go).

eddyb commented 10 years ago

Maybe we should merge (or at least link) #11196 and this issue, they both deal with using traits as types.

And I found an example where you can't just use the trait name as the return type, but maybe it's a bit convoluted:

impl<A, B, C, F: |A| -> B, G: |B| -> C, FG: |A| -> C> Mul<G, FG> for F {
    fn mul(&self, g: &G) -> FG {
        |a| g(self(a))
    }
}

glaebhoerl commented 10 years ago

@eddyb where would you want to use a trait as return type in that example?

eddyb commented 10 years ago

@glaebhoerl sorry if it wasn't obvious, FG is the return type, an unboxed closure, but it needs to show up in the trait's type parameters - Mul<G, FG>. This is how it could look outside an impl (assuming only unboxed closures, as in #11196):

fn compose<A, B, C>(f: |A| -> B, g: |B| -> C) -> |A| -> C {
    |a| g(f(a))
}

glaebhoerl commented 10 years ago

@eddyb I realized that it's function composition :), what I don't see is what you would want to write. One of the main motivations for this feature is that you can't write the type of an unboxed closure. But in that example you don't have to -- it's already generic.

eddyb commented 10 years ago

It's generic over a constant inner anonymous type - maybe I should try to make this clearer. Suppose we have an Iterable<T, Iter: Iterator<T>> trait with a fn iter(&self) -> Iter; - how do you implement that for an Iter type you can't name? With your proposal:

impl Iterable<int, ???> for int { // Always returns the same number.
    fn iter(&self) -> Iterator<int> {
        // Pretend this is an anonymous type that can't be constructed
        // outside of the iter method.
        struct Foo {x: int};
        impl Iterator<int> for Foo {
            fn next(&self) -> Option<int> {
                Some(self.x)
            }
        }
        Foo {x: *self}
    }
}

glaebhoerl commented 10 years ago

Ah, I see what you're saying finally. Well, I don't have any ideas. Do you?

eddyb commented 10 years ago

@glaebhoerl well, the implementation looks complicated enough to warrant using an explicit type and impl Fn on that type. I'd be happy even if only the small function was allowed.

glaebhoerl commented 10 years ago

That's pretty much what I was going to suggest as well.

eddyb commented 10 years ago

I've sketched out the details of implementing this, if anyone wants to try it. There could be some issues with variance, but nothing we can't solve. Should this move to the dedicated RFC process?

flaper87 commented 10 years ago

@eddyb yup, it should. @glaebhoerl Are you planning to write one? or @eddyb ? If so, pls link it here too.

aturon commented 10 years ago

I definitely think there's a gap in the language here, and something like this proposal could be a good starting point.

For the moment, though, I want to mention that there's a reasonable workaround using newtypes. Where you'd like to write something like

fn my_transform<T, Iter: Iterator<T>>(iter: Iter) -> Iterator<T> { ... }

in your proposal, you can instead introduce a newtype with a private field:

struct MyTransformResult<T>(SomeConcreteType<T>);
impl<T> Iterator<T> for MyTransformResult<T> { ... }

fn my_transform<T, Iter: Iterator<T>>(iter: Iter) -> MyTransformResult<T> { ... }

By following this pattern, you hide the concrete type SomeConcreteType<T> of the result, revealing only the trait bound Iterator<T>, while avoiding trait objects. Although you've given a name, MyTransformResult, to the return type, you can freely change the underlying representation (e.g., choice of iterator) without breaking client code.

That said, I do think in the long run we'll want a more convenient way to hide concrete return types without using trait objects.

cc @nikomatsakis

eddyb commented 10 years ago

@aturon sure, but this proposal is mainly useful for returning types unspeakable outside the function body (or not even within the function body). Right now the only such types would be the ones declared within the function body. However, if we get unboxed closures, each closure expression would have its own anonymous type, and returning it would require a feature such as the one proposed here.

I should mention that the ty_anon implementation detail I have in mind could work for any node, be it a function declaration (specifying its return type. or maybe it could be attached to the return ast::Type itself?) or an unboxed closure expression, so it might be beneficial to implement it as part of unboxed closures (which do need some way of storing such an anonymous type). cc @thestinger

lilyball commented 10 years ago

@aturon This proposal would also serve to insulate the function from the details of the various iterators it's employing. It's extremely awkward to try to return a Chain<Map<'a, (int, u8), u16, Enumerate<Filter<'a, u8, vec::MoveItems<u8>>>>, SkipWhile<'a, u16, Map<'a, &u16, u16, slice::Items<u16>>>>. And of course any change to how the implementation works requires updating this ridiculous type.

nrc commented 10 years ago

How would you know the size of such a type? Or would you monomorphise the caller or something?

aturon commented 10 years ago

@nick29581 As far as I understand the proposal, the idea is to monomorphize the callers, as you say.

@glaebhoerl's comment above makes clear that return types are missing the abilities we have with argument types. We can take trait-bounded arguments via generics (static dispatch) or as objects (dynamic dispatch), in either case keeping the actual type abstract. But there's no similar choice for return types: you either return a concrete type or a trait object (forcing the client into dynamic dispatch).

This is a real problem for things like iterators, where static dispatch is essential for performance, but you'd rather not reveal exactly how the iterator is built.

As things currently stand, you can always use newtypes to return a "concrete" type while hiding everything but its trait bounds, as I proposed above. This is a pain. But I think it will work even for at least some proposed versions of unboxed closures, provided that you can implement the proposed closure trait directly -- at the cost of not being able to use closure sugar when you want to return a closure.

emberian commented 10 years ago

To my understanding, it requires no monomorphization. The caller doesn't choose the type. The function returns exactly one type, but that type is opaque except for the trait that it's marked as returning.

On Mon, May 19, 2014 at 8:49 PM, Aaron Turon notifications@github.comwrote:

@nick29581 https://github.com/nick29581 As far as I understand the proposal, the idea is to monomorphize the callers, as you say.

@glaebhoerl's comment abovehttps://github.com/mozilla/rust/issues/11455#issuecomment-32131739makes clear that return types are missing the abilities we have with argument types. We can take trait-bounded arguments via generics (static dispatch) or as objects (dynamic dispatch), in either case keeping the actual type abstract. But there's no similar choice for return types: you either return a concrete type or a trait object (forcing the client into dynamic dispatch).

This is a real problem for things like iterators, where static dispatch is essential for performance.

As things currently stand, you can always use newtypes to return a "concrete" type while hiding everything but its trait bounds, as I proposed abovehttps://github.com/mozilla/rust/issues/11455#issuecomment-43356327. This is a pain. But I think it will work even for at least some proposed versions of unboxed closures, provided that you can implement the proposed closure trait directly -- at the cost of not being able to use closure sugar when you want to return a closure.

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla/rust/issues/11455#issuecomment-43583876 .

http://octayn.net/

aturon commented 10 years ago

@cmr Agreed, the function chooses the type -- it's the caller that's monomorphized. The caller expects some type T: Trait (e.g., T: Iterator<A>). You can think of this as the caller being generic over T. The compiler would specialize the caller to whatever the actual type is, just as it specializes the callee when generics are instantiated.

aturon commented 10 years ago

I've now written a formal RFC on something like this proposal: https://github.com/rust-lang/rfcs/pull/105

thestinger commented 10 years ago

Keeping around an issue here isn't necessary because this isn't backwards incompatible or actionable. A proposal needs to be accepted through the RFC process.

rust-lang / rust

RFC: Limited return type inference constrained to a trait #11455

Problem

Proposed solution