rust-lang / futures-rs

Zero-cost asynchronous programming in Rust
https://rust-lang.github.io/futures-rs/
Apache License 2.0
5.41k stars 626 forks source link

Type-safe futures and streams #458

Closed Kixunil closed 6 years ago

Kixunil commented 7 years ago

As discussed in #206, current poll API of futures and streams takes them by mutable pointer. That would suggest those are always valid after call to poll(), which is not the case, since poll() shouldn't be called after future resolves.

Consuming self and returning it optionally would make more sense from type-safety perspective. It would also simplify implementations, especially when Future::Item is not Clone and must be stored in the future itself before resolving. (The current solution is to have it inside Option and unwrap it each time.)

I've made a demonstration example of some concepts.

Pros of type-safe futures/streams

Cons of type-safe futures/streams

Motivation

I believe the type-safe futures might be useful in cases where safety is more valued. E.g. financial applications, mission-critical software, etc.

As demonstrated in my example, the type safe and type unsafe futures/streams can co-exist using Glue wrapper. So maybe the best solution is to have both version and use the best tool for the job. I'd like to explore this topic and know what do you think about it.

Kixunil commented 7 years ago

Note: Just found that #462 would be probably avoided if the Sink was type safe from the beginning.

alexcrichton commented 7 years ago

FWIW I feel "type safe" is the wrong name for this. That typically invokes thinking along the lines of "if you're not type-safe then you can store a String inside &mut i32" which clearly futures do not allow.

I personally feel the cons outweigh the pros, and have yet to be convinced of this change.

Kixunil commented 7 years ago

Maybe I have a different understanding of what "type safe" means. From what I understand, it also means "you can't store &[u8] inside &str even though they have same representation". In the same spirit you should't store resolved future inside unresolved future. Anyway, I don't want to argue about names right now and I don't have an idea for a better name meaning "type system-enforced invariants". If you have some, I'd appreciate it.

Well, I'm not convinced about "one perfect way of doing it" either. And I certainly don't want to throw away all the existing code. I believe both approaches could co-exist with their pros and cons. What I want is to explore possibilities of statically checked code.

So to address your concerns, you mentioned three:

Could you, please, describe whether you see some problems with my suggestions for solutions? I'd love to solve it. :)

alexcrichton commented 7 years ago

Sure yeah, I'll dig into some more details.

So it sounds like you've got object safety taken care of, but this definitely comes at a cost. There's an extra trait and more concepts to understand (just to be clear), but at a technical level it's solved. For ergonomics much of what I'm thinking is somewhat anectodal. We prototyped a system exactly like this (without object safety) where futures were all moved by value, and that's only one incarnation of the system!

I find it difficult to articulate as to why it was so unergonomic, but in general Rust, while moving by default, typically favors borrowing in terms of ergonomics. Almost all idioms of the standard library heavily rely on borrowing to make methods, helpers, etc work out nicely. Tons of code in the ecosystem relies on borrowing in very colorful ways as well. In general it means that idioms are typically not geared towards lots of movement of ownership and ergonomics tend to suffer once movement is forced on everyone. Movement has also been explored a lot in other systems like rotor. I did not personally use rotor, however, so I can't comment myself.

In general I'm all for type systems upholding invariants where possible, but I personally at least favor ergonomics over the "type system getting in my way". For example the net2 crate does not use type state to track the state a socket is in, so you can make invalid calls at runtime. In practice, however, this rarely happens. Similarly with futures in practice you never actually poll a future after it's done. The executor (e.g. tokio-core) is the one typically managing this, so it's rarely something you need to worry about. As a result with the ergonomics/simplicity not on par with &mut it's not worth it to me.

For performance I'd recommend profiling pieces of code here and there written both ways. IIRC Hyper received a double-digit percent boost in performance moving from rotor to futures. I don't know whether that was movement related or runtime related, but it's at least one data point.

Kixunil commented 7 years ago

Sorry, I was unable to reply sooner, I had a lot to do. Thank you for deep explanation!

You made very good points. I'm still dreaming about solving everything and having a perfect code. :) But it seems like it makes sense to prefer borrowing at least for now. I've been thinking about other ways to solve this but they seem ergonomically worse.

That being said, I'd very much like to improve implementation of Future trait. One thing I had in mind was that since only reactors actually call poll it could be unsafe to allow move-out but that has many side-effects. Maybe if it would be possible to allow correct move-out only that would be solution.

Another thing that might be useful for some cases is equivalent of T! types in Swift. (Basically it's like Option with Deref(Mut) which panics on None.) If you know of existing crate implementing it, I'm all ears. If not, this is probably the approach I could take in my project.

Of course for performance I'd definitely use profiling. I'm not concerned with performance too much, because it is important but not critical for me yet. But I can understand other people caring about it a lot.

I think I'm gonna skip worrying about this too much right now but I'd like to revisit this in the future (no pun intended), when I'll have more time.

Edit, one more thing: I feel a little bit sad every time I see (panicking) checks in code just to prevent memory bugs if programmer screwed up. So that's also one of the reasons I was exploring this. That is probably just my perfectionism...

jimmycuadra commented 7 years ago

This reminds me a bit of what @dtolnay did with Serde. I believe it was in 0.9 when the serialization system changed to move a sentinel value to indicate when serialization was done. Personally, I really like the idea of the type system being used to ensure panic-free code. "In practice, this rarely happens" is the kind of thing people would say back in the dynamic programming world, and it usually ended up meaning you'd get screwed eventually. Of course if the performance cost of that approach is too great, maybe it's not feasible.

alexcrichton commented 7 years ago

FWIW I'd be very curious to hear about scenarios where this actually caused problems. I don't think in practice I've ever seen panics or behavior due to poll-after-future-is-done except maybe once or twice. I'm probably not the best data point though because I'm familiar with all the internals!

An interesting data point is the proposed coroutine language feature, which uses &mut self and panics unconditionally if you resume a finished generator. (basically the same as how Future works today)

Arnavion commented 7 years ago

It hasn't caused "problems" for me.

Receiving self instead of &mut self would simplify Futures and Streams implemented as state machine enums (I assume the coroutines proposal is the same), since they won't need to do the std::mem::replace dance while computing a state transition, and won't need an explicit Invalid variant just to std::mem::replace the initial state with. I've seen existing implementations reuse the terminal Ended state (and even did it myself) instead of an explicit Invalid state that maps to unreachable!() in the state transition (helps catch bugs where a state transition never happened before poll() returned). But I'm not very keen about that, since forgetting to do a state transition may get masked by an early-terminating Future / Stream.

This isn't a particularly big deal for me though. The more common mistake I make is to accidentally not call the inner Future / Stream's poll() during a state transition and thus end up with a stuck Future / Stream. Whether poll() takes self or &mut self doesn't help with that.

Kixunil commented 7 years ago

I had an idea how to get performance of &mut self with safety of self but at the cost of ergonomics. If someone wants fast && safe at all costs, I'm willing to write it down.

AljoschaMeyer commented 7 years ago

Here's a scenario I recently encountered where linear typing would have helped: Implementing a handshake protocol.

This is what I wanted to implement: There's a Handshaker struct, which contains a field stream of some type S: AsyncRead + AsyncWrite. Handshaker::new(stream: S, cryptostuff: &Foo) takes ownership of a stream: S. Taking ownership ensures that it is impossible for any other code to read or write on the stream during the handshake.

The actual handshake is performed via Handshaker::shake_hands(self) -> Result<(NegotiatedSecrets, S), (HandshakeError, S)>. So whether the handshake succeeds or fails, ownership of the stream is returned. The API is useable, ensures that no incorrect reads/writes can mess with the handshake, all is well. (Actually the error type is somewhat simplified, you'd want to handle non-fatal errors without returning ownership of the stream. But these types suffice for the example.)

How should the corresponding async API look like? Instead of synchronously returning a Result<(NegotiatedSecrets, S), (HandshakeError, S)>, just implement Future<Item = (NegotiatedSecrets, S), Error = (HandshakeError, S)>. But poll only takes a &mut self, so you can not move self.stream.

So in my current implementation, the Handshaker struct has a method into_inner(self) -> S and the future implementation uses only Item = NegotiatedSecrets. The API consumer has to consume the future and then use handshaker.into_inner() to retrieve the stream for further usage. Not only is this needlessly error-prone, but it also hinders composition:

Say there is a read/write wrapper type CryptoWrapper for S: AsyncRead + AsyncWrite. Its constructor takes an S and a NegotiatedSecrets. It behaves just like the wrapped stream, but it encrypts all writes and decrypts all reads. It's a oneliner to implement a convenient, synchronous method Handshaker::negotiate_secure_stream(self) -> Result<CryptoWrapper<S>, (HandshakeError, S)>. But its not directly possible to implement Future<Item = CryptoWrapper<S>, (HandshakeError, S)>. You can't call self.into_inner() on a &mut self. Ownership gets in the way again, even though I know that the future will yield a value at most once.

When I saw sink.send() today, I got excited because it manages to provide this sort of API. So I took a look at the implementation and was disappointed, since it needs runtime checks and panics. Which lead me to this issue.


As for the performance cost: poll(self) provides more information to the compiler than poll(&mut self) // by the way, this should never be called twice. So in theory, shouldn't the more accurate typing allow better reasoning and thus better optimizations? Even if there would be a performance loss with the current compiler, it should eventually become the more efficient solution.

And if "need to rewrite stuff" is an actual blocker, I'd happily contribute some of my time.

alexcrichton commented 7 years ago

I'm personally pretty tempted to close this issue. I think we've got lots of data which points in favor of where "in the small" a by-self API can provide an ergonomic and statically checked win. Once futures start scaling up, however, this becomes much less clear and I think by-value self becomes much more unergonomic. I have a related comment about API conventions about close for this.

@AljoschaMeyer it's true that &mut self requires panicking in some situations, but I've never actually seen a panic in practice and I'm personally more worried about the scalability of the Future trait rather than maximal ergnomics in the "simple cases", where I think &mut self clearly shows benefits over by-value self. For example I can't really imagine writing this function with by-value-self and threading that through everywhere.

Arnavion commented 7 years ago

For example I can't really imagine writing this function with by-value-self and threading that through everywhere.

That function can accept mut self, use it as &mut self internally and return Async::NotReady(self) at the end.

The reverse (using owned self in a function that receives &mut self) is less straightforward and requires tricks like the take_mut crate.

alexcrichton commented 7 years ago

That's not really the point, though. The Future trait is the absolute core of the async ecosystem, and the idioms that it sets forth will end up getting used almost everywhere. A by-value self method is a strong message that you should also be using by-value self everywhere. If all "real use cases" of a Future would immediately rather switch to &mut self that's a clear signal to me at least that it's the wrong default.

Arnavion commented 7 years ago

Well, I personally can't imagine either of those - that implementations of poll would have some compulsion to use self in every function they call from poll, or that there would be no (non-trivial) implementation of poll that would benefit from moving owned self over &mut self. But you've been involved with Rust and futures much more so you have more experience about this.

Kixunil commented 7 years ago

I'm going to share my idea, so it won't be lost, although I doubt someone will want to actually implement it (but who knows?).

So the idea was to take slot: S where S: Slot<Item=Self> instead of self or &mut self.

trait Slot {
    type Item;

    fn get_mut(&mut self) -> &mut Self::Item;
    fn take(self) -> Self::Item;
}

struct OptionSlot<'a, T: 'a> {
    // This always point at Option containing Some
    option: &'a mut Option<T>
}

impl<'a, T: 'a> OptionSlot<'a, T> {
    fn new(option: &mut Option) -> Self {
        match *option {
            Some(_) => OptionSlot { option },
            None => None,
        }
    }
}

The impl<'a, T: 'a> Slot for OptionSlot would simply unsafely unwrap the option (because it must always be Some)

Then the caller is responsible for checking invalid states using OptionSlot::new(). I could imagine implementing Slot for some cursor over vector of futures, so when one calls take() that future is removed and thus can't be polled later. This would make the API panic-free without the performance cost of moving self.

The obvious downside would be seemingly complicated API and lack of object safety (It may be possible to restore it somehow, but I didn't figure it out yet.)

I'd be very happy to see someone figure it out. I don't have much time for this. :(

tailhook commented 7 years ago

This inspired me to do a valuable_futures crate (I think the name is funny, no?).

What it does is it has another Future trait that receives self by value. You can't box this thing but you can convert it to a normal future via into_future. Also since I think it's mostly useful for writing futures by hand it doesn't have combinators (but we may add them in future).

There is also a wrapper that passes both mutable pointer and by-value state. I have a lot of code should be simpler if designed this way. (If you have some suggestion for naming this thing, please open an issue in that repository)

I'll try it on some real futures soon, but here are examples:

While the code is approximately of the same length and complexity, there are few tiny but imporant things:

  1. It's easy to forget some returns statements or mutations. While writing the example I had this line missing and got an infinite loop. And yes I've done similar errors in real applications. They are very annoying and hard to debug unless are captured by unit tests.
  2. It's easy to miss repeating the whole operation on timeout poll (it's needed AFAIU because timeout can return Ready on the first poll too). It might also be some other future that's easy to forget to poll. In the state machine approach you basically return new state and always poll it again to activate futures encompassed by it.

(you might question why Print is a state, while it never returns NotReady, the point here is to show a simple example, usually there are multiple states with some embedded futures, still such no-op steps do improve code readability sometimes)

It's just quick prototype to try and gather some comments. Docs and implementation for Stream will be done later.

Kixunil commented 7 years ago

@tailhook Do I understand correctly that you reinvented the wheel or am I missing something?

tailhook commented 7 years ago

@tailhook Do I understand correctly that you reinvented the wheel or am I missing something?

Didn't see that. Have you given a link? (also not at crates.io)

As I see now:

  1. The result of poll() in your implementation isn't Result so I can't use ? operator (or is it already possible in stable rust?)
  2. There is some unsafe code, that I have hard time to follow. I think take().unwrap() and converting to futures::Future (including for creating a Box<Future>) do the work good enough.
  3. Also I'm not sure rust optimizes Glue as well as Option<X> (i.e. the NonZero magic)
  4. And no Supply primitive (which by the way was idea flying around since rotor times)

(well, also usage examples would be useful)

Kixunil commented 7 years ago

Have you given a link?

In the top post in this thread :rofl:. It's PoC code.

  1. It was specifically designed to use Try trait in the future. Result as designed in futures crate seems weird to me anyway, because Async::Ready is usually the thing one is interested in.
  2. unsafe isn't something to be feared of. There will always be Some. The only case when there's None is if the poll() method panics and the only thing that is called later is destructor, which does check. I already expleined this in another comment.
  3. This is PoC. If you'd like to improve it send a PR. If you want to use it, I can publish it.
Arnavion commented 7 years ago

I added most of the existing Future combinators to @tailhook's code to get a feel for it, and the results were pretty expected:

zakarumych commented 7 years ago

I've spent some time writing Future implementations and I think it would be easier to write them with by-value poll. Calling poll not from another Future wrapper is much more rare case. Wild idea: having poll accept special trait Stealer<Self> that can be used to get a reference or take an ownership. Stealer<T> shall be implemented for T, Option<T>, Vec<T> combined with index, OccupiedEntry and other containers from which T may be taken by value.

o01eg commented 7 years ago

I think something like linear types https://github.com/rust-lang/rfcs/issues/814 would be helpful.

Kixunil commented 7 years ago

@omni-viral Stealer<T> sounds much like my idea with Slot trait.

zakarumych commented 7 years ago

@Kixunil exactly the same 😄 I skipped your comment somehow

aturon commented 6 years ago

I'm going to go ahead and close this out, given that we've recently completed the 0.2 revamp and it's pretty clear that this change is not in the cards.