simonw / symbex

Find the Python code for specified symbols
Apache License 2.0
231 stars 6 forks source link

`--module pathlib` option for searching within an importable package #30

Closed simonw closed 1 year ago

simonw commented 1 year ago

Inspired by:

simonw commented 1 year ago

Got a prototype working and it's really cool:

symbex FastChildWatcher -s --docs -m asyncio -n
class FastChildWatcher(BaseChildWatcher)
    """'Fast' child watcher implementation.

    This implementation reaps every terminated processes by calling
    os.waitpid(-1) directly, possibly breaking other code spawning processes
    and waiting for their termination.

    There is no noticeable overhead when handling a big number of children
    (O(1) each time a child terminates)."""
symbex -m httpx -s -in
# from httpx._decoders import ContentDecoder
class ContentDecoder

# from httpx._decoders import IdentityDecoder
class IdentityDecoder(ContentDecoder)

# from httpx._decoders import DeflateDecoder
class DeflateDecoder(ContentDecoder)

# from httpx._decoders import GZipDecoder
class GZipDecoder(ContentDecoder)

...

simonw commented 1 year ago

I made some design decisions:

    if modules:
        module_dirs = []
        module_files = []
        for module in modules:
            try:
                mod = importlib.import_module(module)
                mod_path = pathlib.Path(inspect.getfile(mod))
                if mod_path.stem == "__init__":
                    module_dirs.append(mod_path.parent)
                else:
                    module_files.append(mod_path)
            except ModuleNotFoundError as ex:
                raise click.ClickException("Module not found: {}".format(module))
        directories = [*directories, *module_dirs]
        files = [*files, *module_files]
        if module_dirs or module_files:
            if not symbols:
                symbols = ["*"]
            site_packages_dirs = site.getsitepackages()
            stdlib_dir = pathlib.Path(pathlib.__file__).parent
            sys_paths = [*site_packages_dirs, str(stdlib_dir), *sys_paths]

A module might be a file like cgi.py or a package like httpx/__init__.py - these are treated differently, files are turned into --file and packages are turned into --directory.

In either case, we default to * as the search symbol so if you run e.g. symbex -m asyncio you'll get every class and function in that module.

To try to ensure the --imports feature shows the correct import paths we add both the site-packages directories (for httpx etc) and the standard library directory (found using pathlib.__file__ parent) to the sys_paths mechanism - those are the folders that relative imports are calculated against.