python-trio / flake8-async

Highly opinionated linter for Trio code
https://flake8-async.readthedocs.io
MIT License
17 stars 2 forks source link

Reformat to use external runner #81

Closed jakkdl closed 1 year ago

jakkdl commented 1 year ago

The PR is split into two commits, the first does the refactoring, the second splits up VisitorMiscChecks (turns out some comments were wrong in what visitors in it were necessary for which checks...) so most of those checks should be easier to read. So I'd recommend reading the diffs for those separately.

Was somewhat tricky to get this working with some of the very particular checks - and so it doesn't fully only iterate through the tree once. If a visit_ function calls self.visit() (either directly or through self.generic_visit()/self.visit_nodes()) it will set a flag self.novisit which tells the external runner to exclude that class when visiting subnodes so they're not visited twice.

To not have to do that too often I added Flake8TrioVisitor.save_state(node) which serves the same purpose as get_state/set_state did previously

outer = self.get_state(...)
# do stuff before visiting child nodes
...

# visit child nodes
self.generic_visit(node)
# restore state
self.set_state(outer)

so that can now be written

self.save_state(node, ...)
# do stuff before visiting child nodes
...
# generic_visit and set_state are both handled by external runner

get/set state are still used in a couple places when manually visiting, but should be possible to get rid of them completely and only have save_state().

I think I could get rid of nodes visiting their own subnodes entirely by adding

  1. exit_Xxx which is called for a node after it's subnodes have been visited (would clean up all cases of self.generic_visit(node) being used)
  2. visit functions for node parts that in the AST don't have dedicated nodes, e.g. visit_Finally or visit_WhileElse. This would split up several long functions and solve the last cases of revisits - only trouble child would be 107_108.visit_loop, which explicitly wants to visit the body twice atm, but I could probably rewrite that.

This would maybe give some performance gains, but should make the code simpler as well and can entirely remove Flake8TrioVisitor.visit() and .visit_nodes() - at the cost of adding some more "magic" method names or other construction (realizing as I'm writing I can have visitor classes implement a visit_Xxx as a pseudo-contextmanager, with enter and exit as magic names, and other names called as they match up with the _fields of the Xxx node. And the external runner handles a visit different depending on if it's isinstance function or class)

I'm not super happy about the current novisit solution with lots of set updating, and I have some alternative solutions, but getting rid of it entirely by implementing 1&2 above would ofc be even nicer.

Looking at runtime I'm not seeing much if any improvement in performance, not even just looking at --runfuzz -k site_code, so maybe reiterating through the tree multiple times wasn't that much of an issues, or my external runner adds enough overhead it's voided, but there could maybe be gains on code with large files and complicated functions - or at least when 1+2 are implemented.

I'm also starting to want to split up flake8_trio.py into different files, 1500 lines is starting to be quite inconvenient to navigate, and get any overview on, but not gonna do that in this PR as it would completely mess up any remaining diff logic - I also know you are hesitant to split given installation complications so would have to take care when doing that.

There's two TODO's in the code, probably not gonna touch them in this PR