simonw / symbex

Find the Python code for specified symbols
Apache License 2.0
231 stars 6 forks source link

`--imports` option for outputting `# from xxx.yyy import function_name` #26

Closed simonw closed 1 year ago

simonw commented 1 year ago

This will make it even easier to pipe -s output to llm and get back working code examples.

symbex -d symbex --imports

Should output:

# File: symbex/lib.py Line: 12
# from symbex.lib import find_symbol_nodes
def find_symbol_nodes(code: str, filename: str, symbols: Iterable[str]) -> List[Tuple[(AST, Optional[str])]]

# File: symbex/lib.py Line: 36
# from symbex.lib import code_for_node
def code_for_node(code: str, node: AST, class_name: str, signatures: bool, docstrings: bool) -> Tuple[(str, int)]

Also a --no-file option to suppress output of the # File: lines.

Those can have shortcuts of -i for --imports and -n for --no-code - so if you want to apply both (and just get the imports) you can then do this:

symbex -d symbex -in
simonw commented 1 year ago

The interesting challenge here is how to figure out that import string, since it depends on the location of the function relative to the Python path.

I think the following heuristics should work:

simonw commented 1 year ago

Tried those heuristics and found a problem - when using the -d option you often want to do things like -d ../datasette/tests - but that gives incorrect import statements.

Might be that it needs a --imports-relative ... option too.

simonw commented 1 year ago

That --imports-relative option could be called --sys-path and could add new folders to the simulated sys.path when calculating relative imports.

Part of the reason I'm having trouble here is that -d ../datasette is also pulling in files from datasette/.eggs. Two things that would help here:

simonw commented 1 year ago

This initial implementation doesn't yet do anything fancy with sys.path but it's enough to start experimenting.

simonw commented 1 year ago

This does not do the right thing with class methods, *.*:

symbex '*.*' -in
# from tests.example_symbols import __init__
    def __init__(self, a)

# from tests.example_symbols import method_types
    def method_types(self, b: int) -> bool

For class methods attempting to display a from x import y line doesn't make sense at all.

What would make sense? I guess showing the name of the class, which the # File line does already:

symbex '*.*' --imports
# File: tests/example_symbols.py Class: ClassWithMethods Line: 79
# from tests.example_symbols import __init__
    def __init__(self, a)

# File: tests/example_symbols.py Class: ClassWithMethods Line: 82
# from tests.example_symbols import method_types
    def method_types(self, b: int) -> bool
simonw commented 1 year ago

So what should -in do here? Maybe it should remove the File: line but add in a new Class: XXX line, perhaps like this:

# Class tests.ClassWithMethods
    def __init__(self, a)

So the class is fully qualified.

Or maybe it should look like this:

# from tests import ClassWithMethods
    def __init__(self, a)

I think that second option is more consistent with # from tests.example_symbols import method_types.

simonw commented 1 year ago

A thing that worries me about this feature is that it's genuinely making promises it can't keep.

If an LLM is fed an incorrect import path it will generate incorrect example code.

If a user is browsing function documentation using this and reads an incorrect import path they'll get confused too.