Async IO - Githubissues

steveklabnik commented 9 years ago

Rust currently only includes synchronous IO in the standard library. Should we also have async IO as well, or leave that to an external crate? What would the design look like?

Moved from https://github.com/rust-lang/rust/issues/6842

Valloric commented 9 years ago

I don't see why this couldn't just remain in third-party external crates.

steveklabnik commented 9 years ago

It can. Hence the 'community library' and 'libs' tags both.

Anachron commented 9 years ago

Depends on who is actually maintaining the crate.

In my opinion it should be at least some from the core members as once async and sync are not on the same page anymore, it can lead to confusion of even worse, broken projects.

What I mean is this: Once the community wrote async versions of the std, the community will make decisions on itself, whether something should be one or another way.

It will be hard to keep up-to-date and collaborateurs may not have digged into the core of Rust, though it will either go into a different direction or need someone to sync it with the syncronous version of the core.

josephglanville commented 9 years ago

The problem with "leaving it up to third party crates" is not having a blessed implementation that all libraries can interoperate with.

This problem has already happened a few times now, namely Ruby and Python both have many competing and incompatible asynchronous IO libraries (gevent/twisted, celluloid/eventmachine).

In and of itself that doesn't sound so bad until you realise the huge amount of stuff that gets built on top of said libraries. When you aren't then able to use powerful libraries with each other because they belong to different "async" camps then things get sad pretty quickly.

Contrast this to C#, a language with built in async primitives, it also ships most of the higher level async integrated code (HTTP client etc). There is a single blessed solution and every library builds on top of it, they all interoperate and everyone is happy.

I think it's super important to have async IO in core or blessed in some way to avoid fragmentation.

m13253 commented 9 years ago

I don't see why this couldn't just remain in third-party external crates.

But I think there should be language-level support for some key features that are important to async programming. That includes:

Coroutines (we need compiler support to ensure thread-safe)
Python-styled yield statement (though it originally makes an iterator, in async programming it is used to save current execution point and get back later)
Or C#-styled async/await statement instead
Green threads (it is possible for 3rd-party libraries, but providing an "official" green threading library avoids different network libraries conflicting with each other because they used different green threading libraries)

Yes we can already build a Node.JS-styled callback based async library -- that makes no sense. We need language-level support to build a modern one.

flaub commented 9 years ago

I think it's important to distinguish between traditional async I/O from use cases requiring the standard C select() API. I think that the existing synchronous I/O API is incomplete without the ability to cancel blocking calls. This doesn't seem to be robustly implementable without using select().

I'd like to see the current synchronous I/O API extended to support cancellation without necessarily exposing the underlying select. The goal here is not to provide a highly scalable or particularly efficient way of scheduling I/O requests, but merely a way to cancel blocking calls.

gotgenes commented 9 years ago

Python-styled yield statement (though it originally makes an iterator, in async programming it is used to save current execution point and get back later)

Or C#-styled async/await statement instead

A minor correction to the "Python-styled" comment: it's technically yield from – syntax introduced in Python 3.3. (But yes, it is still based on generators).

Regarding async/await, it's worth noting that this syntax was proposed for Python 3, and has been provisionally accepted for Python 3.5. PEP 492 lists a fair number of languages that have adopted or have proposed the async/await keywords. PEP 492 also does a fair job of describing the weaknesses of the yield from approach, as well as the advantages of async/await.

Not that bandwagons are always the best reason to choose a direction, but async/await will become a very widespread idiom, and supporting it in Rust would provide great accessibility to those of us coming from other languages.

Yes we can already build a Node.JS-styled callback based async library -- that makes no sense. We need language-level support to build a modern one.

Hear, hear!

ryanhiebert commented 9 years ago

Being somebody familiar with Python and its async story, but not as familiar with compiled static languages, it would be helpful to me (and perhaps others) if somebody could comment on something that I'm familiar with from Python.

In Python 3.5, async and await will be based on yield and yield from coroutines based on generators under the hood. This seems like a pretty elegant design to me, but I'd love to hear if there are any problems with that kind of approach.

gotgenes commented 9 years ago

In Python 3.5, async and await will be based on yield and yield from coroutines based on generators under the hood. This seems like a pretty elegant design to me, but I'd love to hear if there are any problems with that kind of approach.

From @nathanaeljones's comment on rust-lang/rust#6842:

One of the earliest comments asked about .NET's async system. It's a callback system, like most, but there is a lot of syntactic sugar, and the 3 different systems that are exposed have different levels of overhead.

The most popular is C# async/await, which generates a closure and state machine to provide a continuation callback. Stack and thread context restoration is expensive, but you can opt out per-task.

Now, what is "expensive"? I'm not sure. (I'm in the same boat as @ryanhiebert. I program mostly in Python. I was made aware of Rust by @mitsuhiko's blog posts.)

lilith commented 9 years ago

@gotgenes The expense depends on how much thread-local storage you're using. I know that it's low enough now that new APIs are async-only. This really depends on the language runtime (and the operating system); I don't think much can be learned about the performance implications by looking at other languages.

ghost commented 9 years ago

From @flaub:

I think it's important to distinguish between traditional async I/O from use cases requiring the standard C select() API. I think that the existing synchronous I/O API is incomplete without the ability to cancel blocking calls. This doesn't seem to be robustly implementable without using select().

I'd like to see the current synchronous I/O API extended to support cancellation without necessarily exposing the underlying select. The goal here is not to provide a highly scalable or particularly efficient way of scheduling I/O requests, but merely a way to cancel blocking calls.

I agree with this. Even trivial programs might suffer from hacks due to the inability to interrupt synchronous calls. (See rust-lang/rust#26446.) I think the language should support asynchronous IO, personally, but if it were really too complex to add to the API, synchronous IO should be made programmatic.

phaux commented 9 years ago

Would be cool to have something like GJ in the core.

jimrandomh commented 9 years ago

Please first put in the straightforward wrapper around select/pselect. I understand you also want to build something better, but my first experience with Rust was hearing that it was 1.0, trying to do a project that involved using pseudoterminals, very close to something I'd already done in C, which involves waiting for input from the user and from a subprocess simultaneously. It immediately got bogged down in a rabbit hole of reverse-engineering macros from Linux system headers and corner cases of FFI, ending up much much more difficult than it had any right to be.

retep998 commented 9 years ago

@jimrandomh Keep in mind that select on Windows only works on sockets, and you can't really wait on groups of files/terminals/pipes. If someone can create a crate with async that works well on both Windows and non-Windows, then I'm sure a lot more attention will be paid to getting async in Rust.

gotgenes commented 9 years ago

Keep in mind that select on Windows only works on sockets, and you can't really wait on groups of files/terminals/pipes.

Lack of complete support in Windows didn't stop Python from using it.

michallepicki commented 9 years ago

I guess this belongs here: https://medium.com/@paulcolomiets/asynchronous-io-in-rust-36b623e7b965

boazsegev commented 8 years ago

:+1:

I doubt we could standardize an Async IO API for all Rust applications without making it part of the Core library... and Async IO seems (to me) to be super important - even if it's just a fallback to a select call (i.e. returning a Result::Err("would block") instead of blocking).

I believe that a blessed / standard Async IO API is essential in order to promote Rust as a network programming alternative to Java, C, and other network languages (even, perhaps, Python or Ruby).

Also, considering Async IO would probably benefit the programming of a Browser, this would help us keep Mozilla invested.

...

Than again, I'm new to Rust, so I might have missed an existing solution (and no, mio isn't an existing solution, it a whole-sale IO framework).

chpio commented 8 years ago

we could standardize the interface in the core and let lib developers do the implementations. that way every one would use that one "blessed" interface but we could have multiple competing implementations (it may be a good idea, i dont know :)).

Also there could be multiple async-api-abstraction-levels just like in JS:

async-await:

async function myFunc() {
  const data = await loadStuffAsync();
  return data.a + data.b;
}

promises:

function myFunc() {
  return loadStuffAsync().then(data => data.a + data.b);
}

callbacks (i hate them ;)):

function myFunc(cb) {
  loadStuffAsync((err, data) => {
    if (err) {
      return cb(err);
    }

    cb(null, data.a + data.b);
  });
}

streams:

tcpSocket
  .pipe(decodeRequest) // byte stream ->[decodeRequest]-> Request object stream
  .pipe(handleRequest) // Request object stream ->[handleRequest]-> Response object stream
  .pipe(encodeResponse) // Response object stream ->[encodeResponse]-> byte stream
  .pipe(tcpSocket)
  .on('error', handleErrors);

or is there already a nice stream implementation in rust? capable of...

highWaterMark with push back: pauses the previous handler when the queue of the current one is full
multitasking: each handler is executed in its own coroutine/thread
byte & object streams
error handling

boazsegev commented 8 years ago

Looking over the code for the mio crate, I noticed that some unsafe code was required to implement the epoll/kqueue system calls that allow for evented IO (mio isn't purely Async IO, as it still uses blocking IO methods)...

It seems to me that unsafe code should be limited, as much as possible, to Rust's core and FFI implementations.

The "trust me" paradigm is different when developers are asked to trust Rust's core team vs. when they are asked to trust in third parties.

I doubt that competitive implementations, as suggested by @chpio , would do a better job at promoting a performant solution... although it could, possibly, be used to select a most performant solution for the underlying core library.

Ruby on Rails is a good example of how a less performant solution (although more comfortably designed) could win in a competitive environment.

seanmonstar commented 8 years ago

There's nothing wrong with unsafe code. A crate that uses it shouldn't be discouraged. Unsafe is required whenever memory safety is sufficiently complicated that the compiler cannot reason about it.

In this specific case though, unsafe is used because Rust demands all ffi code be marked unsafe. The compiler cannot reason about functions defined by other languages. You will never have code that uses epoll without unsafe (even if that unsafety were eventually tucked into a module in libstd).

boazsegev commented 8 years ago

@seanmonstar - On the main part, I agree with your assessment.

However... Rust's main selling point is safety and I do believe that forcing third parties to write unsafe code does hurt the sales pitch. Also, unsafe code written by third parties isn't perceived as trust-worthy as unsafe code within the core library.

I'm aware that it's impossible to use the epoll and kqueue API without using unsafe code and this is part of the reason I believe that writing an Async IO library would help utilize low level system calls while promoting Rust's main feature (safety).

Having said that, I'm just one voice. Both opinions have their cons and pros and both opinions are legitimate.

ahicks92 commented 8 years ago

I'm not sure this is the place and maybe I need to open a separate issue somewhere, but since we don't seem to have it, I'd rate some sort of abstraction over at least socket select as super important. I got to this issue by looking for that and finding other issues that linked here indirectly from 2014; since I see at least one other comment here saying the same thing, I figure I'd add my two cents. Before I go on, I should admit that I'm still from the outside looking in; I really, really want to use Rust and plan to do so in the immediate future, but haven't yet. My primary is C++ and my secondary is Python. While an async I/O library is really a very good idea and definitely gets a +1 from me if only because Python proves that not putting it in the language/standard library will cause epic-level fragmentation, the lack of a standard library select means that I have to essentially opt into some sort of third party crate or write my own abstraction over the calls. I'm considering Rust for the development of a network protocol as a learning project and can spend whatever time I choose, so I have some flexibility in this regard. But the inability to easily find a platform-neutral select in the standard library is leaving a bad taste in my mouth right now. I'd say that getting select in at least for sockets as soon as possible would close a rather big and critical hole, as the only other alternatives that don't involve third-party libraries that I'm seeing seem to involve either a thread for every connection or fiddling around with timeouts. In the latter case, the documentation says the error depends on the platform--I get to write yet another abstraction layer! I'll probably opt into Mio for my current project, but I still consider this a shortcoming because all I really need is select. Speaking more generally, having used both Twisted and Gevent some (though admittedly not enough to be called an expert), I like the look of Asyncio and think copying/borrowing from it might be a good starting point. Twisted always degenerated to inlineCallbacks and gevent always became all the difficulties of threads but with the "advantage" that it lies about this, offering mostly false hope. Since other languages seem to be converging on Asyncio, any solution that looks like it would get my admittedly far-from-expert vote. I'd go so far as to say that we should do it by implementing Python-style generators/coroutines, but that's probably a separate issue and I'm certainly nowhere near thinking about starting any RFCs at the moment.

tailhook commented 8 years ago

While an async I/O library is really a very good idea and definitely gets a +1 from me if only because Python proves that not putting it in the language/standard library will cause epic-level fragmentation

It turns out that python proves quite contrary. There was asyncore in python. But it was quickly obsoleted by twisted (and tornado/gevent... much later). And now there is an asyncio which may be obsoleted by curio, as the latter looks like much nicer by design (but still too early to reason about it's success).

At the end of the day, implementing yield-like construct looks promising. But it's too early to put any async IO library code into the stdlib. There are virtually no downsides of using external crates for the task.

gotgenes commented 8 years ago

At the end of the day, implementing yield-like construct looks promising.

As mentioned earlier, Python has already moved on from yield from to async/await as the syntax of choice for asynchronous operations, for reasons outlined in PEP-492.

I'd like to second pointing out curio as an interesting new model for async in Python.

ahicks92 commented 8 years ago

This is interesting. I've never heard of asyncore before now, but it looks like a very complicated way to use a select call. I'm not surprised that it didn't become popular, especially given the 1999 release date (I found one source placing it at Python 1.5.2, but can't find official confirmation). I'm not very convinced that it's good evidence that I'm wrong about Python proving the fragmentation point. I'm not saying that I'm right, just that I need more convincing before dropping my viewpoint as incorrect. In my opinion, something okay with many protocols is better than 5 or 6 options, each more amazing than the last, but each supporting different protocols. I still think my point about select stands and that it should be put into the standard library as soon as possible. It or something like it is the first step to an async framework. The disadvantage of opting into an external crate for async I/O when all you need is select is that everyone needs to learn the external crate; by contrast, select is a very simple call and can be explained in a few paragraphs. Even if a crate containing only select exists, though, I fail to see any disadvantage to adding it to std::net in some form.

grigio commented 8 years ago

async-await in JavaScript ES2016 or ESNEXT https://github.com/tc39/ecmascript-asyncawait/blob/master/server.asyncawait.js#L44
async-await in Rust https://github.com/rockneurotiko/async-await

It seems that async-await style is popular as non-blocking pattern.

szagi3891 commented 8 years ago

I think that too much of combining of convention. Suffice to answer the Async IO returns was channel.

boazsegev commented 8 years ago

I think we're over thinking it... for now,how about having non-blocking sockets return with an EWOULDBLOCK and having core wrappers for kqueue, epoll and select (system dependent, of course) with a preprocessor able to inform us which system is available and which isn't... At least let Rust be independent of C knowhow as much as possible.

P.S.

I would probably wrap epoll and kqueue in a single wrapper and select in another.

sconger commented 8 years ago

I feel it's worth pointing out that some operating systems have started moving away from the poll/select model. There has been a trend toward the system managing IO and threads together. Windows has tied its newer asynchronous IO to windows thread pools, and Darwin has done similarly with grand central dispatch. Instead of waiting for an event, you ask the OS to do something, and it triggers a callback on a worker thread when the work is done.

I don't see Rust being able to support those APIs without something like async/await. They don't fit in with the current safety mechanisms.

retep998 commented 8 years ago

Pretty much all async on Windows is completion based in that you ask it to do something, and it does it, and then gets back to you. The only options you really have are how it gets back to you. Whether you use the wait functions on overlapped events, or have an APC fire, or a callback in a thread pool, or a completion packet to an IOCP.

tikue commented 8 years ago

How would the borrow checker interact with async/await? If I borrow self mutably and then await some computation whose result will be combined with self, presumably other async states won't be able to borrow self? Would you need to drop any borrows before awaiting?

l0calh05t commented 8 years ago

If I borrow self mutably and then await some computation whose result will be combined with self, presumably other async states won't be able to borrow self?

That's what I would expect.

akcom commented 8 years ago

@retep998 I have to disagree. While IO completion ports are certainly a high throughput asynchronous method, event polling is well-supported by both files and sockets.

retep998 commented 8 years ago

@akcom Really? Windows provides event polling that isn't completely awful (select doesn't count)? Can you provide some examples of this?

akcom commented 8 years ago

@retep998 WaitForMultipleObjectsEx can be used for polling but also for overlapped operations.

retep998 commented 8 years ago

@akcom Ah, for a moment I thought you meant Windows provided readiness based async on Windows. What you're actually saying is that Windows provides ways to receive notification of completed overlapped operations other than IOCPs which is actually what I said in my other message, so I'm not entirely sure what your point is.

The only options you really have are how it gets back to you. Whether you use the wait functions on overlapped events, or have an APC fire, or a callback in a thread pool, or a completion packet to an IOCP.

Note that you cannot use the wait functions to determine when you can read/write a file without blocking, you have to first start an operation asynchronously and then use some method to be notified of when the operation is done.

akcom commented 8 years ago

@retep998 I can understand how you may have missed my point since I mentioned the overlapped operations in passing. If you do a bit of reading about WaitForMultipleObjects (you may have better luck finding an example which utilizes WSAWaitForMultipleEvents which is a thin wrapper for WaitForMultipleObjects), you'll see that when bAlertable=FALSE, you can use it as a simple polling mechanism which will signal when an object is readable or writeable. This does not require you to start an operation asynchronously.

retep998 commented 8 years ago

@akcom Can you link me to such an example which demonstrates it being used to signal when a file handle or socket is readable or writeable?

akcom commented 8 years ago

@retep998 Straight from the source, MSDN.

dgrunwald commented 8 years ago

Note that WaitForMultipleObjects is limited to MAXIMUM_WAIT_OBJECTS, which is 64. It's easy to run into that limit, and requires complex logic to deal with (e.g. a thread pool that uses one thread for each group of 64 sockets).

It's usually a good idea to avoid WaitForMultipleObjects and use completion based async IO instead.

retep998 commented 8 years ago

@akcom Ah, you're referring to WSAEventSelect which provides readiness notifications in the form of an event that can be waited on. However that API is limited exclusively to sockets and is unusable for file handles, so if an async library wants to support files or pipes it would have to support completion based async regardless. And, as @dgrunwald mentioned, the wait API doesn't really scale up too well.

NeoLegends commented 8 years ago

Please excuse my naivety, but what about libuv?

retep998 commented 8 years ago

@NeoLegends Rust used to use libuv years ago when it still had libgreen. That wasn't really async so much as "green" threads, operations were still fundamentally synchronous, and was abandoned due to the complications that arose (spinning on a green thread could cause deadlock for example) and performance reasons. I have heard of some attempts since then to use libuv as a third party crate for Rust, although do note that libuv is just a cross platform abstraction and any Rust library can use the system primitives directly instead of wrapping libuv. One issue with libuv is that it uses callbacks for nearly everything and having to bend your Rust code to that model is rather complicated due to the restrictions it forces on lifetimes and borrows.

NeoLegends commented 8 years ago

@retep998 What if the Rust implementation was a wrapper of libuv? Couldn't one abstract the callbacks away?

chpio commented 8 years ago

but how do you want to deal with "continuation passing" if not with callbacks?

NeoLegends commented 8 years ago

The Rust wrapper library would obviosly have to do that. It doesn't need to expose the callbacks through the public interface, though.

retep998 commented 8 years ago

What would the Rust interface look like then? The libuv model is to get a callback when the operation is done, the libgreen model is to just block the thread until the operation is done. What sort of system do you propose to notify the user when their operation is complete? The big questions that are blocking async IO are not implementation details, using libuv won't save us anything over using system primitives directly. Rather it is what the Rust interface looks like, how can it be designed such that it imposes few restrictions on Rust code, but at the same time maps efficiently to the underlying primitives.

burdges commented 8 years ago

Any C asynchronous IO solutions like libuv require rather thick wrappers to make them safe in Rust, @NeoLegends, so using a native Rust solution like mio makes more sense.

There are many asynchronous IO interface designs @retep998. In Rust, there is using mio directly, but indirectly one could use mio through coroutines as mioco, promises as gj, futures as eventual_io, or state machines as rotor.

All have advantages and disadvantages that can get quite subtle. I think one should focus on making the current mio ecosystem play nicely together with zero-ish additional cost. And hope it evolves into something that can be standardized. Just a few questions :

What do these different approaches cost? Are state machines like rotor the only "zero cost" solution? Are coroutines or green threads like mioco higher cost than fancy callbacks like promises/futures?
Does eventual_io pay any costs beyond gj for its greater abstraction via eventual? It'd be interesting if gj and eventual_io could be merged by making the Cap'n proto RPC use eventual_io or something.
Can eventual be run on top of a rotor state machine and/or mioco? Can this be streamlined and done efficiently?
Is there an argument that coroutines are dangerous and their use should be minimized? I'm thinking here about restrictions to parallelism primitives like x10 that avoided deadlocks.

tailhook commented 8 years ago

I'm not sure the questions are for me, but let me share some thoughts:

What do these different approaches cost? Are state machines like rotor the only "zero cost" solution? Are coroutines or green threads like mioco higher cost than fancy callbacks like promises/futures?

mioco shouldn't be too more expensive in the theory, but the problem is stack management. Most rust programs expect large stack size: file copy allocates 64kb buffer on the stack, each format/log macro occupies quite a lot and so on. And only way to free stack memory after use is to deallocate the whole coroutine/microthread. (deallocating stack on each client request is too slow, if you ask, you need to have a pool of threads/stacks)

I have some experience in vagga which runs on musl libc so has small stack limit of 80kb. And it turns out that the stack size runs out pretty quickly. (And 80kb per coroutine/mictrothread is too much for many network applications).

Can eventual be run on top of a rotor state machine and/or mioco? Can this be streamlined and done efficiently?

Yes, I strongly believe so. This is how I think it should be done: raw protocol implementation with state machines, and a simple protocol wrapper with closures/futures on top of that. It hasn't been done yet, though.

NeoLegends commented 8 years ago

@retep998 Ah, I see. Thanks for the clarification.

rust-lang / rfcs

Async IO #1081