xonsh / xonsh

:shell: Python-powered shell. Full-featured and cross-platform.
http://xon.sh
Other
8.43k stars 641 forks source link

[RFC] Async Extensions #3435

Open AstraLuma opened 4 years ago

AstraLuma commented 4 years ago

So, async.

The current state is while I believe the parser handles the async language (as a pass-thru to Python, I believe), using async in xonsh is basically just raw asyncio with little support. Which isn't great.

So, I had an idea about how to extend the core language to support async.

First of all, as with python, there are two contexts: sync and async. Sync is default, and async is defined as syntax inside an async coroutine (async def). A regular function (def) inside an async function (async def) is still synchronous.

I highly encourage discussion and thoughts.

Reasoning

So, async is becoming more important in Python, and xonsh is a pretty cool exploration tool. So I think xonsh should have an async story.

However, the main repl of xonsh is synchronous, and while that's good as its primary use to write and execute ad-hoc code quickly, it doesn't mix with async (because python's async is a red/blue system).

So, I think xonsh should adopt and extend Python's dual-context. Just as we added quality-of-life improvements like p-strings and backtick searching, as well as learn to live in the async context.

Additional context

xonsh should (on behalf of the user) spin up a background event loop on first async use. (Not automatically on startup, to allow time to configure event loop policies.) The language features below should operate against this loop.

This is primarily to support the creation of background Tasks, even those created dynamically. An event loop that started/stopped intermittently would interfere with this.

Sync Context

The existing xonsh language is all sync context. The only proposed the extension I have is supporting the await operator.

Using the await operator in a sync context causes the passed awaitable to be run and its data returned, using the default loop. This involves some marshaling of data and synchronizing between threads, and handling cancellation, but it's mostly just using some thread-safe futures.

Async Context

Async is where xonsh falls apart. While it'll compile and run, and spawned jobs will be run synchronously, blocking the entire event loop. This is Not Good.

I propose that the xonsh language operators that perform IO ($(), !(), $[], ![], @$(), bare subprocess, and globbing) work a little differently under an async context.

I think this should be implemented by the parser, compiling to a new set of dunder methods.

Subproceses

All forms of process spawning should be adapted to return an async structure.

For !() and ![] (which return a CommandPipeline), they return a proxy object with two primary operations:

$() and $[] should return simple awaitables. That is, they only support the await operator, and just produce stdout or None.

The way I think these should operate is that the process is started immediately in the background (relative to the calling Task), and the Task may choose when and how to block on them. They should support asyncio Cancellation.

For simplicity, I expect these changes will be implemented by delegating to an execution pool (another thread)--xonsh core is synchronous, and I don't think anyone wants to put in the effort to rewrite it.

Bare subprocesses

Bare subprocesses are basically the same as $[] in a sync context, and I think that should remain true.

A bare subprocess in an async context should execute a command (not capturing output) and block until it returns. Basically, await $[...].

@$()

@$() performs an IO operation (subprocess) followed by a CPU operation (splitting), and is used exclusively in subprocess context (no await keywords).

The usage of it remains the same (no new operator, no addition of await), but under the hood, the compiler adds the appropriate awaits and the implementation should work in an asynchronous manner.

Globbing

As globbing does IO, and Linux filesystem IO can vary wildly (while local IO is usually negligible, it can become problematic if any network filesystems are mounted), backticks should become asynchronous structures.

Similar to above, they should have two operations:

Questions

melund commented 4 years ago

@astronouth7303 It sounds really cool.

I haven't had the opportunity to use async in standard Python for anything serious. So maybe I have a hard time seeing where it would be useful in xonsh. But I am all for this if you want to give it a try. I could probably learn a lot from just following it.

AstraLuma commented 4 years ago

If you only consider xonsh and the xonsh ecosystem in isolation, it's of marginal use (about the only benefit is that asyncio Tasks are cancellable, unlike thread-based solutions).

The big win is interacting with async-based networking ecosystems--eg, an async client library.

bobhy commented 4 years ago

I'm not seeing the benefit of all the work (to implement now and to maintain going forward). You've offered 2 scenarios where it might be nice to have async: invoking an async client library and to explore python language from xonsh prompt. Neither seems very compelling to me in my (admittedly primitive) use of xonsh.

Last one first, discoverability. I'm not going to trust any inferences from xonsh behavior for any python code I might write. Any code you implement to support async/await is behavior that's different from python. Your implementation might even be better or cleaner than python, but still different and therefore "misleading". I'll just drop into the IDE and write a simple test program.

Likewise, if I want to use an async client library from the command line (and that does sound appealing), I'm probably going to need some wrappers to tame it before I can use it in a one-liner, and I'll write them in a .py so I'll also have it available in some code of my own, (the wrappers being the dialect of the library that I speak.)

But this is only my personal opinion, I don't think such a feature would actually break anything I'm ever likely to do in xonsh.

scopatz commented 4 years ago

I think this is a great idea definitely the direction we should head in. I would even be in favor of dropping v3.5 support early (EOL is Sept. 2020) in order to enable this kind of thing. I think async support would enable a lot of fun things that people don't think shells can do, like dispatching large jobs, etc and generally doing map-reduce activities natively.

gforsyth commented 4 years ago

I would even be in favor of dropping v3.5 support early

I'm definitely on board with this. It'll also let us use f-strings everywhere.

gforsyth commented 4 years ago

We can probably also benefit from examining @Carreau 's work in enabling async in IPython

Carreau commented 4 years ago

I think this is a great idea definitely the direction we should head in. I would even be in favor of dropping v3.5 support early (EOL is Sept. 2020) in order to enable this kind of thing.

Why not just adopt NEP29 instead ? It gives clear times on when to stop support for what (IPython follows it).

And for some of the async stuff you are trying to do (we do them in IPython), it may make sens to only have them with 3.7, or 3.8+ only...

We can probably also benefit from examining @Carreau 's work in enabling async in IPython

I've recently started to merge new stuff on master (new dependencies and stuff), so if there is any shareable work, I'm happy to pull it out in a package. 38 now have PyCF_ALLOW_TOP_LEVEL_AWAIT (https://docs.python.org/3/whatsnew/3.8.html#builtins) which make a lot of things wayyyyy easier.

Delaying starting an event loop can be super hard in particular if you run with prompt toolkit. I think you really want to be carefull with the semantic you expose. From experience in IPython people tend to not mess with many event loops, and it feel like the consensus is "it's ok if i have to restart to choose another one". so as a first pass I would go with a flag to start xonsh with XXX integration. ... I'm biased but I would chose trio which can run asyncio code (https://github.com/python-trio/trio-asyncio), There was some work with trio that would trigger errors on forgotten await, but unfortunately it's not ready yet.

Threads are nasty, and handing things in BG thread may not work with all the libraries, like open CV in non main thread used to not work. But if you think you know how to do it i'll trust you.

I'm definitely on board with this. It'll also let us use f-strings everywhere.

I'm thinking of autorunning pyupgrade --py36-plus on IPython via a bot (to not annoy contributors). Happy to share any wok on this as well. Recently added .git-blame-ignore-revs

I'm not seeing the benefit of all the work ... [snip]

Well it's a vicious circle, no-one use async libs because it's annoying to use, so no-one write features to make them easier to use...

In IPython/Jupyter autoawait is now extensively use to ... do the restoration of Rembrandt's the Night's Watch (https://twitter.com/erdmann/status/1204060033178308609), user don't need to think they want to use an eventloop, they just know x need to be awaited.

AstraLuma commented 4 years ago

The main reason I suggested running the primary event loop on a separate thread from the main shell is:

Given those requirements, I think marshalling calls/returns/errors between the sync and async worlds is the simplest solution.

zasdfgbnm commented 3 years ago

I would recommend that we modify the behavior of things like $(command &) and !(command &) to return an awaitable in sync context. Currently it is returning a NoneType:

$ type($(ls &))
[1]+ running: ls & & (86396)
NoneType
$ type(!(ls &))
[1]+ running: ls & & (86375)
NoneType