python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.2k stars 341 forks source link

Ideas for linting trio programs #671

Closed njsmith closed 1 year ago

njsmith commented 6 years ago

I haven't given up on catching missing awaits at runtime (see #495), but another strategy that could be useful is to have a linter/static analyzer that catches them for you. In fact, you really want both, for all the reasons runtime + static checks are useful for other kinds of errors: runtime checks can be 100% reliable when the error actually happens, and are especially useful for beginners who can't be expected to set up fancy linting tools; static checks can catch mistakes early, even in untested code-paths.

Examples of this being a real issue:

I know absolutely nothing about static analysis, so I have no idea how to actually implement a lint for this, but let's have an issue to discuss and maybe we'll figure it out together.

In theory, this should be pretty straightforward, because in Trio, the property to enforce is: any function call which returns an awaitable, should have an await to immediately consume that awaitable. This makes things way simpler than asyncio, where it's common to create coroutine objects and then pass them somewhere else, and sometimes this is what you want (passing them into asyncio.run or asyncio.create_task or asyncio.gather) and other times its an accident because you forgot an await. So a lint for asyncio would require some pretty sophisticated data flow inference and knowledge of different APIs; for trio, once we've figured out which calls are to async functions/methods, the check itself requires only superficial knowledge of the surface AST. OTOH, this may make it more difficult to convince tools like mypy or pylint to include our check, because it won't work for asyncio users; and, if they do include it there will need to be some kind of configuration to turn it on or off. Maybe it would need to be a plugin, at least to start out.

I got a bit nerd-sniped by this last night. Findings so far:

pylint

Pylint has docs on how to add a new check, and an official plugin API, so we could distribute a checker as a third-party package. And, (through astroid), it has some reasonably sophisticated APIs to go from a call back to the definition of the function/method being called. So playing around, I came up with this: https://gist.github.com/njsmith/0ce8c661684ba4514b9ea4aac6182f3c

But unfortunately, it turns out that figuring out which functions return awaitables is a bit more complicated than it seems. In particular, the code in that gist can (I think) catch many cases where there's a call to a function, there's no await at the call site, and the function is defined with async def. However, this is still wrong in at least two cases:

So this is probably doable, but it was looking complicated enough that I paused to look at mypy instead.

mypy

In theory, mypy should solve this problem nicely, because it already has a stupendously sophisticated type inference engine that knows which calls return awaitables and which ones don't. So, we should be able to skip all the complications that we ran into with pylint, and just go straight to implementing our check. And in theory, this can even be more accurate, because it can take into account explicit type hints in cases where static analysis alone isn't enough.

Mypy also has a plugin API, but it's completely undocumented. That said, it looks like it is possible to ship a third-party plugin and then import it through the mypy config file by using the undocumented plugins = ... option in the config file, which contains a comma-separated list of setuptools-style entry points pointing to plugin objects.

AFAICT, though, currently the plugin API only allows you to define custom type inference rules (e.g., inferring the return value from open by checking whether the mode string has a b in it). I don't see any way to define a custom type checking rule.

It looks like the overall flow is mypy.main:mainmypy.build:build, whose BuildManager object together with dispatch runs 3 semantic analyzer passes (semanal_pass1.py, semanal.py, semanal_pass3.py) and then 2 passes of type checking which are performed using a mypy.checker:TypeChecker object. So I guess we'd want to hook in somewhere in those two passes?

Others?

Someone in the python/typing gitter claimed that pyre will have a check for this soon. Given that pyre is written by facebook and facebook uses asyncio, I suspect anything they come up with will be much cleverer than we need; and given that it's written in ocaml, I doubt we can easily write a plugin in python :-). Also they currently only support running checks on macOS and Linux, and I've never heard of it before now. So probably not what most of our users will be using?

Anything else?

smurfix commented 6 years ago

I don't see any way to define a custom type checking rule.

We may not need a custom type checker; as soon as the Awaitable is processed further it'll be reported as an error by the regular mypy rules. (Assuming that the rest of your program is typed appropriately.)

I do hope that https://github.com/python/mypy/issues/2499 will get some traction, though; it's old enough. :-P

njsmith commented 6 years ago

as soon as the Awaitable is processed further it'll be reported as an error by the regular mypy rules.

As far as I know, none of the three bullet point examples above would be caught by mypy regularly. Two involve functions with no return value called for side effects, and in the other the return value is formatted intoa string, which isa legal operation on coroutine objects.

belm0 commented 6 years ago

I'm thinking about how this policy will affect API's.

It's going to be common to have utility functions to compose awaitables. So e.g. an wait_all() would have to be:

await wait_all(some_event.wait, some_async_foo)

rather than

await wait_all(some_event.wait(), some_async_foo())
njsmith commented 6 years ago

@belm0 yep. Just like how trio already does it, and also all non-async python code :-)

belm0 commented 6 years ago

So if the wait functions have parameters, we need to resort to partial, correct?

E.g. I have a ValueEvent which takes lambda etc. on wait.

await wait_all(partial(event.wait_value, lambda x: x > 10), event2.wait, ...)
njsmith commented 6 years ago

Correct.

belm0 commented 6 years ago

I'm seeing useful standard library functions which take raw coroutines:

https://docs.python.org/3/library/contextlib.html#contextlib.AsyncExitStack

njsmith commented 6 years ago

Those docs are being lazy and saying "coroutine", without clarifying whether they mean a "coroutine function" or a "coroutine object". This is one reason I always say "async function" or "coroutine object". (Also, "async function" is just a clearer name that avoids obscure jargon.)

If you look at the source, it turns out that AsyncExitStack's methods actually take "coroutine functions", i.e. you pass in an async function, not a coroutine object, so they follow Trio-style, not asyncio-style.

Someone should probably submit an issue or PR to clarify AsyncExitStack docs. [Edit: done, so @belm0's comment above will stop making sense soon :-)]

sorcio commented 6 years ago

This still bites me on occasion. In my experience, the existing runtime check (the "coroutine was never awaited" warning) has always been good enough to spot the issue once I encounter it. Additional runtime checks might be good but I don't think there is much to be gained. On the other hand I would love a static warning!

I was planning to play around a bit with pylint to see what I could achieve and wow, @njsmith is always many steps ahead :) In my case, a best-effort pylint solution would already be a huge win. Pylint is already part of my workflow and I haven't looked into mypy yet.

Issue 1 (cannot detect decorated functions or other callable-then-awaitable patterns) is a limitation of course, but one I can live with for now.

Issue 2 (difference between async gens and async funcs): wouldn't it be enough for now to expect an async-something to be either awaited or async-iterated?

njsmith commented 6 years ago

In my case, a best-effort pylint solution would already be a huge win. Pylint is already part of my workflow and I haven't looked into mypy yet.

There's no reason we have to pick one, either. If people care enough about pylint to make it work with pylint, then cool, pylint users get the benefit; it doesn't stop us from adding a check to mypy too, for mypy users.

Issue 1 (cannot detect decorated functions or other callable-then-awaitable patterns) is a limitation of course, but one I can live with for now.

Agreed.

Issue 2 (difference between async gens and async funcs): wouldn't it be enough for now to expect an async-something to be either awaited or async-iterated?

It's not always true that async generators get immediately iterated, e.g.:

async with aclosing(agen_fn()) as agen:
    async for ... in agen:
        ...

But also I suspect it's about as easy to implement a check for whether a function is an async generator, as it is to implement a check for whether it's being used in an async for. It's not hard, just annoying :-)

I guess something like:

class IsThisAnAsyncGenerator(ast.NodeVisitor):
    its_an_async_generator = False

    def visit_Yield(self, *args, **kwargs):
        self.its_an_async_generator = True

    # Can't happen on 3.7, but future-proofing in case async generators ever start supporting 'yield from'
    def visit_YieldFrom(self, *args, **kwargs):
        self.its_an_async_generator = True

    def visit_FunctionDef(self, *args, **kwargs):
        # Don't recurse into nested functions
        pass

    def visit_AsyncFunctionDef(self, *args, **kwargs):
        # Don't recurse into nested async functions either
        pass

def is_this_asyncfunctiondef_node_an_async_generator(node):
    visitor = IsThisAnAsyncGenerator()
    visitor.visit(node)
    return visitor.its_an_async_generator

(See: ast.NodeVisitor, list of node types.)

@sorcio any interest in collecting this all up into an installable pytest plugin and writing some tests?

sorcio commented 6 years ago

Oh, yes, this is interesting. I don't have experience with pytest plugins and I'm on vacation for a couple weeks. I will see into it later on :) Thank you very much for the comments!

There's no reason we have to pick one, either. If people care enough about pylint to make it work with pylint, then cool, pylint users get the benefit; it doesn't stop us from adding a check to mypy too, for mypy users.

+1, that's what I meant as well.

belm0 commented 6 years ago

I tried the pylint work in progress but it's giving me a false positive at every use of await in my codebase. What am I missing?

njsmith commented 6 years ago

It's not even work-in-progress, more like thinking-with-an-editor-buffer-open. I've never even tried running it, so... I guess there's a bug ¯\(ツ)

belm0 commented 6 years ago

now it's a WIP!

belm0 commented 6 years ago

I've used patterns like this before of wrapping context managers-- it foils the proposed static checks.

    @asynccontextmanager
    async def manager(some_args):
        ...
        yield
        ...

    def manager_wrap():
        return manager(10)
njsmith commented 6 years ago

@belm0 your manager is an async generator and has a decorator. Both of those are detectable statically. What's the problem?

belm0 commented 6 years ago

in manager_wrap() the lint check will complain that manager() is called without await.

njsmith commented 6 years ago

Not if our lint check handles async generators correctly (perhaps using something like the code in this comment), or not if our lint check ignores functions with unrecognized decorators. They're available in the AsyncFunctionDef.decorator_list attribute. See the marked line below:

In [2]: ast.parse("@foo\nasync def bar(): pass")                                
Out[2]: 
ast.Module(
    body=[
        ast.AsyncFunctionDef(
            name='bar',
            args=ast.arguments(
                args=[],
                vararg=None,
                kwonlyargs=[],
                kw_defaults=[],
                kwarg=None,
                defaults=[]
            ),
            body=[ast.Pass()],
            decorator_list=[ast.Name(id='foo', ctx=ast.Load())],    ###### <----- look here
            returns=None
        )
    ]
)
njsmith commented 6 years ago

Actually it looks like in pylint there's a simpler way to check if an AsyncFunctionDef node is an async generator:

def is_this_asyncfunctiondef_node_an_async_generator(node):
    yields = node.nodes_of_class(astroid.Yield, skip_klass=(astroid.FunctionDef, astroid.AsyncFunctionDef))
    yield_froms = node.nodes_of_class(astroid.YieldFrom, skip_klass=(astroid.FunctionDef, astroid.AsyncFunctionDef))
    if yields or yield_froms:
        return True
    else:
        return False
belm0 commented 6 years ago

My take on the status for pylint:

700 is something I've had installed on my CI for a week now. The codebase happens to not have async generator functions. Handling those correctly (already spelled out by @njsmith) should definitely be addressed before merging this code and suggesting others use it. Also need tests.

There are likely false negatives but that shouldn't be a blocker-- coverage can be improved incrementally.

Regarding a separate repo: simply merging the outstanding PR lets people instantly use the checker (just add it to plugin option on pylint config). Vs. setting up and maintaining a separate repo (and yet another for mypy and any other 3rd party tooling we want to integrate with), including ensuring that the other repo's testing follows trio updates-- I don't think the trio project has that kind of person-power to extend when there are so many things of arguably higher priority. My vote is for keeping things lean until there are resources and motivation to do more.

njsmith commented 6 years ago

Oh man we should totally also lint for nurseries/cancel scopes with yields inside them: #638, #264

belm0 commented 6 years ago

It would also be nice to have a lint check for async functions not containing await or async control flow. Sometimes it can't be avoided (e.g. when implementing async methods on an interface), but in the majority of cases it's overeager use of async.

njsmith commented 5 years ago

We should also lint for calls to well-known blocking functions inside async def. We're unlikely to catch all of them, but I bet over time we could accumulate a list of common offenders like time.sleep, requests.get, subprocess.run, etc. (We might want an optional mode to catch open and other file operations.)

belm0 commented 5 years ago

update on pending pylint missing-await checker (#700):

jtrakk commented 5 years ago

From chat:


# What you have:
async with trio.open_nursery() as nursery:
    async with await trio.open_process(hunspell_cmd, **proc_kwargs) as proc:
        nursery.start_soon(send_hunspell, proc, ['McLaughlin', 'SecureStorage'])
        nursery.start_soon(recv_hunspell, proc)
    # Process is closed here, after starting the two tasks but before they finish
# Then we wait for tasks to finish

# What you want:
async with await trio.open_process(hunspell_cmd, **proc_kwargs) as proc:
    async with trio.open_nursery() as nursery:
        nursery.start_soon(send_hunspell, proc, ['McLaughlin', 'SecureStorage'])
        nursery.start_soon(recv_hunspell, proc)
    # First we wait for tasks to finish
# Then the process is closed

Could a linter catch instances where a process or other task is closed before it finishes?

njsmith commented 5 years ago

Good point! I guess the general case of detecting when something is closed while in use would require some super sophisticated data-flow analysis, but this specific pattern of async with x around a call to start_soon(..., x) seems like it could be simple enough for a linter to detect + a common enough error to be a useful lint.

njsmith commented 5 years ago

Though we might want to special-case async with open_nursery() as nursery: nursery.start_soon(..., nursery), since that's not an error. More generally, I guess the point is that we need to also have some heuristic to rule out cases where the async with surrounds the nursery.

njsmith commented 5 years ago

We might also want to warn if we see a nursery body that contains exactly one statement, and that statement is a call to nursery.start_soon ("consider using a regular function call instead")

jtrakk commented 4 years ago

From chat:

anything that awaits inside a finally: clause needs shielding. I habitually use with trio.fail_after(2,shield=True). Otherwise they could swallow errors.

shamrin commented 3 years ago

I've just created feature request for Pyright about forgotten await: https://github.com/microsoft/pyright/issues/1317

Personally I use Pyright all the time while writing Python code in VSCode. It's works great, especially in interactive use.

shamrin commented 3 years ago

@erictraut suggested Pyright could maybe implement a check for "Awaitable object that is dropped on the floor (not assigned to a variable, passed as an argument, returned from the function, etc.)".

It will solve two of the three cases mentioned in the description of this issue (trio.sleep and Yury's confession):

trio.sleep(1) # could warn about missing `await` because the result is not used in any way

But of course it won't warn about this example:

result = some_awaitable() # can't warn in general case

What do you think? Any potential downsides for the general case? Is there anything that could help Pyright catch even more cases? (And avoid the need for a config option or a plugin.)

njsmith commented 3 years ago

@shamrin I replied on the pyright issue.

shamrin commented 3 years ago

Pyright author has just implemented reportUnusedCoroutine rule that makes it an error to write alone-on-a-line foo() when it should be await foo(). The rule will be enabled by default in all modes.

https://github.com/microsoft/pyright/commit/fb4a4e92bddf8adaef4b8e1924dd427a9f5cbd99

shamrin commented 3 years ago

Now that pyright reports alone-on-a-line missing await, the only remaining¹ real-life problem (that the type checker can't catch) is implicit string conversion:

async def getone(): return 1
name = getone()  # should have been `name = await getone()`
print(f'one is {name}')
$ python3 ha.py
one is <coroutine object getone at 0x1012b0e40>
sys:1: RuntimeWarning: coroutine 'getone' was never awaited

(I assume this is what broken cpython bot did: https://github.com/python/cpython/pull/9168#issuecomment-420361009.)

How do we catch it?

Of course no checker should forbid a very common print(f'debug: {any_object}')². However, a warning could help. It could warn about implicit conversion to string for certain cases. The warning could be silenced by calling repr() or by special inline comment. Would this warning be too drastic?

¹ At least it's the only one I know. Have you seen other real-life examples of forgotten await? ² Starting from Python 3.9, print(f'debug: {any_object=}' would be better. No need to warn about this obviously.

Zac-HD commented 1 year ago

Closing this because typecheckers and flake8-trio are going pretty well these days - I'll even be speaking about it at PyCon 🙂