simonw / symbex

Find the Python code for specified symbols
Apache License 2.0
231 stars 6 forks source link

Option to just see signatures #4

Closed simonw closed 1 year ago

simonw commented 1 year ago

A way to search and just get the function or class first line definitions.

Not sure what to do about multiline function definitions.

simonw commented 1 year ago

Here's what I need for this:

(Pdb) node
<ast.FunctionDef object at 0x1038f2110>
(Pdb) node.args
<ast.arguments object at 0x1038f2b60>
(Pdb) node.args.args
[<ast.arg object at 0x1038f2bc0>, <ast.arg object at 0x1038f2bf0>, <ast.arg object at 0x1038f2c20>, <ast.arg object at 0x1038f2f80>, <ast.arg object at 0x1038f2f50>]
(Pdb) node.args.args[0].__dict__
{'arg': 'symbols', 'annotation': None, 'type_comment': None, 'lineno': 33, 'col_offset': 4, 'end_lineno': 33, 'end_col_offset': 11}
(Pdb) node.args.args[-1].__dict__
{'arg': 'silent', 'annotation': None, 'type_comment': None, 'lineno': 37, 'col_offset': 4, 'end_lineno': 37, 'end_col_offset': 10}
simonw commented 1 year ago

For classes I think I need to do the same thing with node.bases:

(Pdb) node.bases
[<ast.Name object at 0x105717820>]
(Pdb) node.bases[0].__dict__
{'id': 'object', 'ctx': <ast.Load object at 0x1055a7f40>, 'lineno': 86, 'col_offset': 4, 'end_lineno': 86, 'end_col_offset': 10}
simonw commented 1 year ago

Almost got this working.

I'm not sure how to differentiate between:

def cli(
    symbols,
    files,
    directories,
    signatures,
    silent
):

And:

def cli(
    symbols,
    files,
    directories,
    signatures,
    silent):

I can't see a mechanism for detecting if the closing ) in the code is on a new line or not, which means I don't know how much code to output.

Might be better to reconstruct the function definition from the node.args?

simonw commented 1 year ago

If the function has a -> None return signature I can consider that too:

(Pdb) node.returns.__dict__
{'value': None, 'kind': None, 'lineno': 38, 'col_offset': 5, 'end_lineno': 38, 'end_col_offset': 9}
simonw commented 1 year ago

The hacky way to do this would be to look for the line that ends with ): (maybe taking whitespace into account too).

Don't want to get confused by this though:

def foo(): return 1
simonw commented 1 year ago

Plus, def foo(): return 1 in --signatures mode should ideally just return def foo(): without the inline function body.

simonw commented 1 year ago

I had this, which I had 50% written myself (based on an example spat out by GPT-4 earlier):

def function_definition(function_node: AST):
    function_name = function_node.name

    arguments = []
    for arg in function_node.args.args:
        if getattr(arg.annotation, "id", None):
            arguments.append(f"{arg.arg}: {arg.annotation.id}")
        else:
            arguments.append(arg.arg)
    arguments_str = ", ".join(arguments)

    return_annotation = ""
    if function_node.returns:
        if hasattr(function_node.returns, "id"):
            return_annotation = f" -> {function_node.returns.id}"
        elif function_node.returns.value is None:
            # None shows as returns.value is None
            return_annotation = " -> None"

    return f"def {function_name}({arguments_str}){return_annotation}"

And I ran this:

symbex function_definition | llm --system 'rewrite this to handle default arguments, keyword-only arguments, varargs and other edge-cases' -m 4

I got back:

def function_definition(function_node: AST):
    function_name = function_node.name

    arguments, default_arg_index = [], 0
    for arg in function_node.args.args:
        arg_str = f"{arg.arg}: {arg.annotation.id}" if getattr(arg.annotation, "id", None) else arg.arg
        if default_arg_index < len(function_node.args.defaults):
            arg_str += f"={repr(function_node.args.defaults[default_arg_index])}"
            default_arg_index += 1
        arguments.append(arg_str)

    varargs = [f"*{arg.arg}: {arg.annotation.id}" for arg in function_node.args.vararg] if function_node.args.vararg else []
    kwargs = [
        f"{arg.arg}={repr(arg.default)}: {arg.annotation.id}"
        if getattr(arg.annotation, "id", None)
        else f"{arg.arg}={repr(arg.default)}"
        for arg in function_node.args.kwonlyargs
    ]

    combined_arguments = arguments + varargs + kwargs
    arguments_str = ", ".join(combined_arguments)

    return_annotation = ""
    if function_node.returns:
        if hasattr(function_node.returns, "id"):
            return_annotation = f" -> {function_node.returns.id}"
        elif function_node.returns.value is None:
            return_annotation = " -> None"

    return f"def {function_name}({arguments_str}){return_annotation}"

This looked impressive on first glance, but it's not actually good enough code - nested list comprehensions are hard to understand and it didn't survive robust testing.

simonw commented 1 year ago

Generating some test code with ChatGPT:

Generate example Python code that illustrates all different shapes of functions and classes, showing off every kind of feature that I might want to test using these functions

# Function with no arguments
def func_no_args():
    pass

# Function with positional arguments
def func_positional_args(a, b, c):
    pass

# Function with default arguments
def func_default_args(a, b=2, c=3):
    pass

# Function with arbitrary number of positional arguments
def func_arbitrary_positional_args(*args):
    pass

# Function with arbitrary number of keyword arguments
def func_arbitrary_keyword_args(**kwargs):
    pass

# Function with both arbitrary positional and keyword arguments
def func_arbitrary_args(*args, **kwargs):
    pass

# Function with positional-only arguments (Python 3.8 and above)
def func_positional_only_args(a, /, b, c):
    pass

# Function with keyword-only arguments
def func_keyword_only_args(*, a, b, c):
    pass

# Function with type annotations (Python 3.5 and above)
def func_type_annotations(a: int, b: str) -> bool:
    pass

# Class with no base classes
class ClassNoBase:
    pass

# Class with a single base class
class ClassSingleBase(int):
    pass

# Class with multiple base classes
class ClassMultipleBase(int, str):
    pass

# Class with a metaclass
class ClassWithMeta(metaclass=type):
    pass
simonw commented 1 year ago

This is absurdly useful.

% symbex -s -f tests/example_code.py
# File: tests/example_code.py Line: 2
def func_no_args()

# File: tests/example_code.py Line: 6
def func_positional_args(a, b, c)

# File: tests/example_code.py Line: 10
def func_default_args(a=<ast.Constant object at 0x105a40310>, b=<ast.Constant object at 0x105a40340>, c)

Traceback (most recent call last):
  File "/Users/simon/.local/share/virtualenvs/symbex--e1aIHUb/bin/symbex", line 33, in <module>
    sys.exit(load_entry_point('symbex', 'console_scripts', 'symbex')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/.local/share/virtualenvs/symbex--e1aIHUb/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/.local/share/virtualenvs/symbex--e1aIHUb/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/simon/.local/share/virtualenvs/symbex--e1aIHUb/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/.local/share/virtualenvs/symbex--e1aIHUb/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/Dropbox/Development/symbex/symbex/cli.py", line 75, in cli
    snippet, line_no = code_for_node(code, node, signatures)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/Dropbox/Development/symbex/symbex/lib.py", line 24, in code_for_node
    return function_definition(node), node.lineno
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/simon/Dropbox/Development/symbex/symbex/lib.py", line 67, in function_definition
    varargs = [f"*{arg.arg}: {arg.annotation.id}" for arg in function_node.args.vararg] if function_node.args.vararg else []
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'arg' object is not iterable
simonw commented 1 year ago

I generated fixtures like this:

cat tests/example_symbols.py | llm --system 'Use this code to produce output like this:
pipe quote> 
pipe quote> ("func_no_args", "def func_no_args()"),
pipe quote> 
pipe quote> One line like that for every class and function in this file'

Output:

("func_no_args", "def func_no_args()"),
("func_positional_args", "def func_positional_args(a, b, c)"),
("func_default_args", "def func_default_args(a, b=2, c=3)"),
("func_arbitrary_positional_args", "def func_arbitrary_positional_args(*args)"),
("func_arbitrary_keyword_args", "def func_arbitrary_keyword_args(**kwargs)"),
("func_arbitrary_args", "def func_arbitrary_args(*args, **kwargs)"),
("func_positional_only_args", "def func_positional_only_args(a, /, b, c)"),
("func_keyword_only_args", "def func_keyword_only_args(*, a, b, c)"),
("func_type_annotations", "def func_type_annotations(a: int, b: str) -> bool"),
("ClassNoBase", "class ClassNoBase:"),
("ClassSingleBase", "class ClassSingleBase(int):"),
("ClassMultipleBase", "class ClassMultipleBase(int, str):"),
("ClassWithMeta", "class ClassWithMeta(metaclass=type):")
simonw commented 1 year ago

I'm a bit stuck on this one:

# Function with default arguments
def func_default_args(a, b=2, c=3):
    pass

In the debugger:

(Pdb) function_node.args.defaults[0].__dict__
{'value': 2, 'kind': None, 'lineno': 12, 'col_offset': 27, 'end_lineno': 12, 'end_col_offset': 28}
(Pdb) function_node.args.defaults[1].__dict__
{'value': 3, 'kind': None, 'lineno': 12, 'col_offset': 32, 'end_lineno': 12, 'end_col_offset': 33}
(Pdb) function_node.args
<ast.arguments object at 0x101274250>
(Pdb) function_node.args.__dict__
{'posonlyargs': [], 'args': [<ast.arg object at 0x101274280>, <ast.arg object at 0x1012742b0>, <ast.arg object at 0x1012742e0>], 'vararg': None, 'kwonlyargs': [], 'kw_defaults': [], 'kwarg': None, 'defaults': [<ast.Constant object at 0x101274310>, <ast.Constant object at 0x101274340>]}
(Pdb) function_node.name
'func_default_args'

I'm not sure how to match up the .defaults to the arguments though - there are three arguments and two defaults, how do I know which arguments the defaults are for?

I guess it's based purely on indexing from the end - if there are two defaults and three args then the last two args must be the ones with defaults.

simonw commented 1 year ago

I tried using this to see if there were any obvious gaps in the function, but I couldn't figure out a prompt that didn't just show me invalid Python function examples instead:

symbex function_definition | llm --system 'Examples of function definitions that would not work if fed to this function that are not Python syntax errors' -m 4
1. Missing colon for function definition:

```python
def example_function(a, b)
    pass
  1. Mismatched parentheses in arguments:
def example_function(a, b, c:
    pass
  1. Duplicate arguments:
def example_function(a, a):
    pass