Open jakkdl opened 1 year ago
Leaving flake8 as a backend does not really seem worth it at the moment, it's added some headaches and complications at times - but we've managed to work around pretty much all of them so I don't currently see any particular upside in ditching it - unless new versions come out that break things (which isn't that unlikely tbh). Only thing would be like usability / speed concerns for people that don't want to run it through flake8, or ability to specify version properly in pre-commit.
OK, I like the idea of an incremental migration pathway so let's stick with flake8
for now - we can always revisit after all the substantive logic is converted over.
For autofixes, it'd be nice to hook that from shed
in the same way that it already hooks Hypothesis' codemods. I'm OK with saying "lint xor codemod" for any particular check, which is all I'd use, though ideally we could "lint all and codemod some" - not sure how that would be implemented though.
I'll have to have a look at hooking in codemods, feels like lint all and codemod some
shouldn't be too tricky - but that's a future problem.
I'm realizing now that I don't know how flake8 will react to having the file changed under it - probably badly. A temporary solution could be to write the modified file to {old_filename}_modified.py
or placed in a separate directory or something, but that's probably a good enough reason to go ahead with ditching flake8 quite early. One could possibly still have a flake8-hook so it can be run without codemods through flake8 for those that want it. Will have to research the alternatives.
I went through the list of checks and categorized them as follows.
with trio.fail_after(...):
or with trio.move_on_after(...):
context does not contain any await
statements. This makes it pointless, as
the timeout can only be triggered by a checkpoint.
yield
inside a nursery or cancel scope is only safe when implementing a context manager - otherwise, it breaks exception handling.TRIO900
:thinking:)finally:
or except BaseException/trio.Cancelled
unless you use a shielded
cancel scope with a timeout.
except BaseException
, except trio.Cancelled
or a bare except:
with a code path that doesn't re-raise. If you don't want to re-raise BaseException
, add a separate handler for trio.Cancelled
before.
trio.Cancelled
can be added before it.await
ing it.import trio
for the linter to work.
trio.
before them - but I think support for this will be added soon.start[_soon]
might be invalidly accessed while in use, due to context manager closing before the nursery. This is usually a bug, and nurseries should generally be the inner-most context manager.nursery.start[_soon]
and not passing itself as a parameter can be replaced with a regular function call.nursery.start_soon
in __aenter__
doesn't wait for the task to begin. Consider replacing with nursery.start
.trio.sleep(0)
with the more suggestive trio.lowlevel.checkpoint()
.trio.sleep()
with >24 hour interval should usually betrio.sleep_forever()
.trio.[NonBase]MultiError
, prefer [exceptiongroup.]BaseExceptionGroup
. Even if Trio still raises MultiError
for legacy code, it can be caught with BaseExceptionGroup
so it's fully redundant.anyio.get_cancelled_exc_class()
to a variable, since that breaks linter checks and multi-backend programs.anyio.get_...
, but doable (making sure one keeps track of variable scopes).Cancelled
and BaseException
must be re-raised - when a user tries to return
or raise
a different exception.
timeout
parameter - use trio.[fail/move_on]_[after/at]
instead
while <condition>: await trio.sleep()
should be replaced by a trio.Event
.
task_status
keyword parameter) not in --startable-in-context-manager
parameter list, please add it so TRIO113 can catch errors when using it.
@asynccontextmanager
not allowed.trio200-blocking-calls
for how to configure it.
httpx.AsyncClient
.httpx.AsyncClient
. Looks for urllib3
method calls on pool objects, but only matching on the method signature and not the object.await nursery.start(trio.run_process, ...)
.await trio.run_process(...)
.os.*
call in async function, wrap in await trio.to_thread.run_sync()
.trio.open_file(...)
.trio.wrap_file(...)
.trio.wrap_file()
to get an async file object.os.path
in async functions, prefer using trio.Path
objects.I think I can chuck in a bunch of trio.lowlevel.checkpoint()
just before the line that would raise these - but there's going to be a bunch of edge cases that are tricky.
return
from async function with no guaranteed checkpoint or exception since function definition.I'm realizing now that I don't know how flake8 will react to having the file changed under it - probably badly. A temporary solution could be to write the modified file to
{old_filename}_modified.py
or placed in a separate directory or something, but that's probably a good enough reason to go ahead with ditching flake8 quite early. One could possibly still have a flake8-hook so it can be run without codemods through flake8 for those that want it. Will have to research the alternatives.
I was thinking that we'd have the Codemod
class return both the modified cst, and a list of lint errors which includes whether they're fixed in the new cst. Then we can have flake8 report all lint errors, a pure-autoformat tool (eg shed integration) just fix the autofixable things, and a future standalone CLI write autofixes and report the non-fixed lint errors.
Re: your notes on specific rules, mostly looks great. Few edits:
TRIO101
: nope, no way to make this safe. (on further thought seems to duplicate TRIO900
🤔)TRIO111
: automatically swapping order of context managers is too dangerous for my taste; better to alert - and perhaps educate - the user, who can then decide what to do about it.TRIO116
or TRIO118
eitherTRIO900
: nothing we can do here, needs a pretty deep redesign
- TRIO112: nursery body with only a call to
nursery.start[_soon]
and not passing itself as a parameter can be replaced with a regular function call.
Ah, but not if the nursery is passed strict_exception_groups=
, because that would change exception types.
Working on libcst now and will probably push something up today/tomorrow.
I was thinking that we'd have the Codemod class return both the modified cst, and a list of lint errors which includes whether they're fixed in the new cst. Then we can have flake8 report all lint errors, a pure-autoformat tool (eg shed integration) just fix the autofixable things, and a future standalone CLI write autofixes and report the non-fixed lint errors.
I thought this sounded good, but you're going to end up with two trees, the unmodified and the transformed one, which at the very least are going to desync in line numbers - but possibly in even bigger ways. With autofixing turned on, you want the line numbers from the transformed file in the output, with it off you'll want line numbers from the original tree. So will at least have to pass a flag to the class/program telling it whether/which codes should be autofixed.
To get started on autofixing, I think I might as well get it running as an independent plugin at the same time so it's usable. Looked through how various linters are run, and the one in shed looked simple enough to get started with. So any comments on that before I get started copying much of the implementation of it? https://github.com/Zac-HD/shed/blob/master/src/shed/_cli.py
I'll be adding a couple flags only valid when run independently (which will error out and tell the user to run it without flake8 if passed with that), but try to otherwise keep a very similar interface regardless of if it's run with or without flake8.
That sounds good to me! Obviously the already-converted checks are the best place to start autofixing, but then missing/redundant checkpoints would be highest priority as they're a pita to do manually.
That sounds good to me! Obviously the already-converted checks are the best place to start autofixing, but then missing/redundant checkpoints would be highest priority as they're a pita to do manually.
Got started on converting 91X today~
Some solid progress, eval_file 910 only has a single incorrect error at the moment, and I'm steadily chipping away at the ones in 911. Although it is kinda hilarious to see this table
Lines with different # of errors:
--------------------------------------
| line | code | actual | expected |
| 113 | TRIO911 | 1 | 2 |
| 151 | TRIO911 | 8 | 3 |
| 153 | TRIO911 | 5 | 3 |
| 164 | TRIO911 | 1 | 2 |
| 165 | TRIO911 | 1 | 2 |
| 292 | TRIO911 | 3 | 2 |
| 409 | TRIO911 | 1 | 0 |
| 596 | TRIO911 | 1 | 0 |
| 634 | TRIO911 | 33 | 1 |
| 642 | TRIO911 | 32 | 1 |
| 706 | TRIO911 | 1 | 0 |
| 718 | TRIO911 | 1 | 0 |
| 722 | TRIO911 | 1 | 0 |
| 726 | TRIO911 | 1 | 0 |
| 731 | TRIO911 | 1 | 0 |
| 735 | TRIO911 | 1 | 0 |
| 743 | TRIO911 | 0 | 1 |
Merely 32 errors too many on line 634, nbd :grin:
I feel like we're slowly closing in on warranting a reconsider a name change for this project - it's soon not dependent on flake8 (although will continue being possible to run as a plugin), and while trio is the main focus, we've added support for anyio as well. So both parts of flake8-trio are kind of misleading at this point :sweat_smile: Certainly nothing that must be addressed soon (if ever), but thought I'd bring it up - it feels especially weird when the binary is called "flake8_trio".
"It's called flake8_trio
, but we don't actually use flake8
and it's designed to work for anyio
" 😹
Yeah, seems like a new name might be in order! Maybe something mentioning structured concurrency? sclean
for "SC (c)lean" is fairly compact but probably too cryptic - it's hard to find something self-descriptive yet also short enough to be a good command name!
scuba
- for (uh) Structured Concurrency Unravels Broken Async? It's short and fairly memorable... but no, https://pypi.org/project/scuba/ is taken.
if len(all_filenames) > 4:
# If we're formatting more than a few files, the improved throughput
# of a process pool probably covers the startup cost.
try:
with multiprocessing.Pool() as pool:
for error_msg in pool.imap_unordered(rewrite, all_filenames):
if isinstance(error_msg, str):
print(error_msg) # noqa
return
except BlockingIOError as err: # pragma: no cover
# This can occur when `os.fork()` fails due to limited available
# memory or number-of-processes. In this case, we fall back to
# the slow path; and reprocess whatever we've already done for
# simplicity. See https://stackoverflow.com/q/44534288/
print(f"Error: {err!r}\n Falling back to serial mode.") # noqa
"It's called
flake8_trio
, but we don't actually useflake8
and it's designed to work foranyio
" joy_catYeah, seems like a new name might be in order! Maybe something mentioning structured concurrency?
sclean
for "SC (c)lean" is fairly compact but probably too cryptic - it's hard to find something self-descriptive yet also short enough to be a good command name!
scuba
- for (uh) Structured Concurrency Unravels Broken Async? It's short and fairly memorable... but no, https://pypi.org/project/scuba/ is taken.
Haha, scuba is fun. "scry" is also taken... But there's a long list of words starting with sc
that one can turn into forced acronyms :D https://www.thefreedictionary.com/words-that-start-with-sc
Both trio
and anyio
(and asyncio
) also has io
in their name - so pushing in an io
in the name is also a possibility. scio-checker
would be a pretty straightforward alternative. https://pypi.org/project/Scio/ is taken, although pretty abandoned. scion
is free though!
There's also sciolist
One who exhibits only superficial knowledge; a self-proclaimed expert with little real understanding.
and sciolism
The practice of expressing opinions on something which one knows only superficially or has little real understanding of; also, shallow or superficial knowledge
which have some pretty fun self-deprecating properties :grin: Or something-something "the checker might point out some common errors, but as a true sciolist it only has superficial knowledge with no deep understanding and should not be fully relied on"
updated OP with full task list (as of now)
added 5 more tasks:
We're building a dedicated auto-fixing linter on top of LibCST called Fixit, with a plan for a stable release in the next month or so. Would you be interested in considering Fixit as a base for this work, rather than reimplementing your own linting and autofixes from scratch?
Oh interesting, using that as a middle layer to libcst could help solve some of the pain points with modularity, shared state, configuration, and other stuff. This plugin is in a coasting phase atm though, and we probably wouldn't see much gain from overhauling the infrastructure - but if performance or something other becomes a pain point it might warrant further consideration.
I'm not seeing any way of hooking in plugins in the documentation though, or are you suggesting something like moving rules over to the fixit repo? The latter I don't think is terribly relevant with how niche/opinionated (/overcomplicated :grin: ) many of the rules are in this. Though you're very welcome to crib rules if you want :)
The libcst autofixers in https://github.com/Zac-HD/shed might be even more relevant
I'm not seeing any way of hooking in plugins in the documentation though, or are you suggesting something like moving rules over to the fixit repo? The latter I don't think is terribly relevant with how niche/opinionated (/overcomplicated 😁 ) many of the rules are in this. Though you're very welcome to crib rules if you want :)
The general idea is that Fixit makes writing and distributing rules as simple as possible. Rules don't need to define "plugins" to be used — the user simply adds the module/package name containing the rules to their project config, and Fixit will automatically find and import them, assuming they're available in the environment or locally in the project being linted.
For this project, assuming the current name is maintained, and all of the rules were defined in the top-level flake8_trio
package, and users have already done pip install flake8-trio
, then their config would look something like:
[tool.fixit]
enable = ["flake8_trio"]
Fixit would then import the flake8_trio
module, and search its contents for all LintRule
classes.
The Fixit configuration guide has more details and examples.
The general idea is that Fixit makes writing and distributing rules as simple as possible. Rules don't need to define "plugins" to be used — the user simply adds the module/package name containing the rules to their project config, and Fixit will automatically find and import them, assuming they're available in the environment or locally in the project being linted.
Cool! As I said it's not really worth the effort to reconsider switching the backend at this point, but I'll keep Fixit & friends in mind for the future~ :sparkles:
"It's called
flake8_trio
, but we don't actually useflake8
and it's designed to work foranyio
" 😹Yeah, seems like a new name might be in order! Maybe something mentioning structured concurrency?
sclean
for "SC (c)lean" is fairly compact but probably too cryptic - it's hard to find something self-descriptive yet also short enough to be a good command name!
scuba
- for (uh) Structured Concurrency Unravels Broken Async? It's short and fairly memorable... but no, https://pypi.org/project/scuba/ is taken.
with ruff starting to implement these rules they chose the TRIOxxx
rule number, which likewise maybe isn't great advertisement that the suggestions also apply to AnyIO code. Though idk how open ruff is to renaming rules @charliermarsh
Maybe @cooperlees and I should formally recommend stealing ASYNC
for these lint warnings, which are incidentally a superset of flake8-async
?
We should be able to re-code these rules if you choose to change the prefix. (We also have the ASYNC
rules in Ruff.)
Maybe you just take flake8-async? Happy to transfer away ... I don't have time for it etc. etc.
Sure, transfer it to my personal account? I'll make a last release to deprecate it in favour of the project currently known as flake8-trio, and then we can use the ASYNC key here and for the corresponding ruff rules.
We definitely need to rename this then imo, could maybe even take over the flake8-async moniker - though I should make sure a decent subset of rules work with asyncio & give good error messages. Or maybe flake8-aio? Works well with an error code of AIOxxx
Ok, initiated ...
And, transfer accepted! Thanks, Cooper :-)
aio
is already tightly associated with the aiohttp
-et-al, asyncio-only ecosystem, so I'd be concerned about confusion there. I think ASYNCxxx
as the error code makes sense - if we sunset the current flake8-async then the name will be available, and every check in it is already available in flake8-trio too.
It makes sense to ensure that the current rules don't give nonsense results on asyncio code, though I probably wouldn't bother writing any asyncio-specific rules... if you want your async code to be correct, use anyio or trio!
It makes sense to ensure that the current rules don't give nonsense results on asyncio code, though I probably wouldn't bother writing any asyncio-specific rules... if you want your async code to be correct, use anyio or trio!
Oh yeah, that's the extent I was thinking of.
every check in it is already available in flake8-trio too.
I double checked now as well that we cover all the methods in flake8-async too, and we do. The error codes are different and they're split slightly different - but otherwise current flake8-trio is a strict superset.
Then we're settling on flake8-async as the name for the project as well? I was worried about some confusion of thinking async => asyncio, but given that async
is a pure-python keyword I guess it applies just as well to structured-concurrency async.
OK, with the transfer complete there are a couple of action items - I've assigned most of them to @jakkdl and those which need specific permissions to me, but happy to take other contributions too of course!
flake8-async
repo, add a note to the README, and archive it.flake8-trio
-> flake8-async
, update docs to include a note on the history of the package(s)flake8-trio
on PyPI, which just depends on flake8-async
and does nothing else.and then I think we're ready to ping Charlie about ruff codes 🙂
after (hopefully correctly) releasing new versions of flake8-trio and flake8-async it's now relevant for @charliermarsh that you might want to recode rules.
Am I correct that we should "remove" flake8-trio and redirect the rules to codes under flake8-async?
Yes, that's what we've done here and I'd encourage ruff to do so as well.
Moving from #70
what does the UX look like if we're not leaning on flake8 for that? I presume similar, so how much work is that to add?
Quickly looked at whether the flake8 interface could still work - and it's possible to still use flake8 without much problem. You can indicate that you want the lines of a file and then that can be dumped into
libcst.parse_module()
. Flake8 does require plugins to accept one oftree
,logical_line
orphysical_line
as parameter in__init__
for it to classify the plugin and run it at the correct time, so we'd have to still accept thetree
parameter - but we can just throw it out and construct our own tree from thelines
sent at the same time.Somewhat hackish, but means that the consideration of ditching flake8 is pretty much fully independent.
This even means that the transition can even be gradual - where in the intermediate phase you have both an AST and a libCST tree, and run separate runners for them.
TODO
Standalone functionality
Converted to libCST:
Autofixing
finally:
orexcept BaseException/trio.Cancelled
unless you use a shielded cancel scope with a timeout.await
ing it.nursery.start[_soon]
and not passing itself as a parameter can be replaced with a regular async function call.nursery.start_soon
in__aenter__
doesn't wait for the task to begin. Consider replacing withnursery.start
.trio.sleep(0)
with the more suggestivetrio.lowlevel.checkpoint()
.trio.[NonBase]MultiError
, prefer[exceptiongroup.][Base]ExceptionGroup
. Even if Trio still raisesMultiError
for legacy code, it can be caught withBaseExceptionGroup
so it's fully redundant.user-configured
trio200-blocking-calls
for how to configure it.I think many of these can be fixed, but needs a long list of what to replace with
httpx.AsyncClient
.httpx.AsyncClient
. Looks forurllib3
method calls on pool objects, but only matching on the method signature and not the object.await nursery.start(trio.run_process, ...)
.await trio.run_process(...)
.os.*
call in async function, wrap inawait trio.to_thread.run_sync()
.trio.open_file(...)
.trio.wrap_file(...)
.trio.wrap_file()
to get an async file object.os.path
in async functions, prefer usingtrio.Path
objects.Requires a bunch of logic
I think I can chuck in a bunch of
trio.lowlevel.checkpoint()
just before the line that would raise these - but there's going to be a bunch of edge cases that are tricky. higher priority