simonw / symbex

Find the Python code for specified symbols
Apache License 2.0
231 stars 6 forks source link

Filters: `--async` and `--function` and `--class` and various typing ones #21

Closed simonw closed 1 year ago

simonw commented 1 year ago

The --signatures option turns out to be a pretty great way to start navigating a new codebase.

It might be useful to be able to filter by types of content. Potentially the following:

These would be additive, so the following:

symbex --signatures --method --function

Would return all methods and all functions.

But... what would this do?

symbex -s --async

Would it return all async functions AND async methods? If so, would combining it with --function or --method not make a difference?

Or should there be a --async-method filter that's different from --async (which only gets async functions)?

simonw commented 1 year ago

I'll prototype it and play with it and see how it feels.

simonw commented 1 year ago

This is a pretty fun prototype:

% symbex --async -d ../datasette -s | head -n 20 
# File: /Users/simon/Dropbox/Development/datasette/test-plugins/register_output_renderer.py Line: 6
async def can_render(datasette, columns, rows, sql, query_name, database, table, request, view_name)

# File: /Users/simon/Dropbox/Development/datasette/test-plugins/register_output_renderer.py Line: 26
async def render_test_all_parameters(datasette, columns, rows, sql, query_name, database, table, request, view_name, data)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/register_output_renderer.py Line: 6
async def can_render(datasette, columns, rows, sql, query_name, database, table, request, view_name)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/register_output_renderer.py Line: 26
async def render_test_all_parameters(datasette, columns, rows, sql, query_name, database, table, request, view_name, data)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/register_output_renderer.py Line: 60
async def render_response(request)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/memory_name.py Line: 6
async def startup(datasette)

# File: /Users/simon/Dropbox/Development/datasette/pugins-older/show_open_files.py Line: 6
async def show_open_files()

Needs a bit more thought, then tests and docs.

simonw commented 1 year ago

I didn't implement --method yet, not completely convinced it is necessary.

simonw commented 1 year ago

A --typed filter that just returns things that have type signatures - and a --untyped one that returns things without type signatures - might be neat too.

simonw commented 1 year ago

Maybe this:

If you are working on a project and trying to add types to every single function, you can iterate on it using this to find the functions that still need some work:

symbex --untyped --partially-typed --signatures
simonw commented 1 year ago

Maybe --method isn't necessary because you can use a '*.*' selector instead?

Or maybe --method is a shortcut for that adds the '*.*' selector?

It would be slightly surprising that while most of these filter options add together, --method --async would filter methods to just the async ones.

For that reason I think using *.* to filter methods may be more consistent.

simonw commented 1 year ago

Confirmed, this works already to get all async methods:

symbex -d ../datasette -s --async '*.*'
simonw commented 1 year ago

I built a messy prototype of the typing ones and I like them a lot:

symbex --typed -s
# File: tests/example_symbols.py Line: 50
def func_type_annotations(a: int, b: str) -> bool

# File: tests/example_symbols.py Line: 94
def function_with_non_pep_0484_annotation(x: ?, xx: ?, yy: ?, y: ?, zz: float) -> ?

# File: tests/example_symbols.py Line: 104
def complex_annotations(code: str, symbols: Iterable[str]) -> List[Tuple[(AST, Optional[str])]]

# File: symbex/lib.py Line: 11
def find_symbol_nodes(code: str, filename: str, symbols: Iterable[str]) -> List[Tuple[(AST, Optional[str])]]

# File: symbex/lib.py Line: 35
def code_for_node(code: str, node: AST, class_name: str, signatures: bool) -> Tuple[(str, int)]

# File: symbex/lib.py Line: 66
def match(name: str, symbols: Iterable[str]) -> bool

# File: symbex/lib.py Line: 91
def function_definition(function_node: AST)

# File: symbex/lib.py Line: 193
def annotation_definition(annotation: AST) -> str

# File: symbex/lib.py Line: 241
def annotation_summary(node: AST) -> AnnotationSummary
symbex --untyped -s
# File: tests/example_symbols.py Line: 10
def func_positional_args(a, b, c)

# File: tests/example_symbols.py Line: 15
async def async_func(a, b, c)

# File: tests/example_symbols.py Line: 20
def func_default_args(a, b=2, c=3)

...
simonw commented 1 year ago

Am I right that these should always be OR and not AND?

I just tried this and it didn't give me what I expected:

symbex --async --typed -s

Turns out I expected just typed async functions, but I got back ALL async functions and then ALL typed functions.

simonw commented 1 year ago

I'm going to try these as AND rather than OR and see how that feels.

simonw commented 1 year ago

Yes, AND for filters feels better. Switching to that.

simonw commented 1 year ago

This method is failing the test for --fully-typed:

class ClassForTypedTests:
    def method_fully_typed(self, a: int, b: str) -> bool:
        pass

Because self has no type. So I need a special case check for class methods.

simonw commented 1 year ago

I'm going to refactor AnnotationSummary - it looks like this:

AnnotationSummary = namedtuple(
    "Summary", ["num_arguments", "num_typed", "return_is_typed"]
)

I'm going to change it to this:

class TypeSummary:
    fully: bool
    partially: bool
)
simonw commented 1 year ago

Because self has no type. So I need a special case check for class methods.

That caused another problem: now --typed is returning any method with a self argument because I wrote code that says "treat that as typed" - even though there are no type annotations on anything else.

simonw commented 1 year ago

Here's a fun trick:

 llm --system 'Suggest several ways to refactor this' '    if (
        not symbols
        and not signatures
        and not async_
        and not function
        and not class_
        and not typed
        and not untyped
        and not partially_typed
        and not fully_typed
    ):
        ctx = click.get_current_context()
        click.echo(ctx.get_help())
        ctx.exit()'

Output:

There are several ways to refactor this code:

  1. Consolidate all the boolean checks into a list and use all() to check if all items are False:
checks = [
    not symbols, not signatures, not async_,
    not function, not class_, not typed,
    not untyped, not partially_typed,
    not fully_typed
]

if all(checks):
    ctx = click.get_current_context()
    click.echo(ctx.get_help())
    ctx.exit()
  1. Use De Morgan's Law to simplify the boolean checks:
if (
    symbols or signatures or async_ or function or
    class_ or typed or untyped or partially_typed or
    fully_typed
):
    pass
else:
    ctx = click.get_current_context()
    click.echo(ctx.get_help())
    ctx.exit()
  1. Use any() with the negation of the boolean checks:
if not any([
    symbols, signatures, async_, function,
    class_, typed, untyped, partially_typed,
    fully_typed
]):
    ctx = click.get_current_context()
    click.echo(ctx.get_help())
    ctx.exit()

All three options achieve the same result, so it's up to personal preference on which one to choose.

simonw commented 1 year ago

One last thought: an __init__() method should not need a return type annotation to qualify as not breaing the fully typed rule.