thebjorn / pydeps

Python Module Dependency graphs
https://pydeps.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
1.8k stars 114 forks source link

Option to ignore modules in the graph #227

Open robinderat opened 3 months ago

robinderat commented 3 months ago

I have a pretty complex project with many modules and I would like to use pydeps to understand the structure of the code better. However, currently, pydeps is including all modules in the graph, even if they are not imported directly, making it very chaotic and hard to understand.

I would like to have an option to filter out modules that are not directly referenced.

I created a small example project to describe my case

example-project
└── example_project
    ├── __init__.py
    ├── main.py
    └── module
        ├── __init__.py
        └── submodule.py

main.py:

from example_project.module.submodule import bar

bar()

pydeps example_project --reverse --rankdir BT generates the following graph:

example_project

What I would like to see is only 2 nodes and 1 edge, from main to submodule. The edge between main and module only creates confusion, because I would only expect to see it if I had a line like from example_project import module

thebjorn commented 3 months ago

Hi @robinderat , and thank you for your interest in pydeps.

The behavior you're seeing is (at least partially) an artifact of how the python module system works. The statement from a.b import c imports and executes modules a and b. To see what is going on I've added some print statements to the different files in your example

srv/tmp/../example-project❱ yamldirs example_project
example_project:
  __init__.py: print('example_project.__init__.py')
  main.py: |
    print('main.py')
    from example_project.module.submodule import bar
    bar()
  module:
    __init__.py: print('module.__init__.py')
    submodule.py: |
      print('submodule.py')

      def bar():
          print('bar()')
          return None

running main gives the following:

srv/tmp/../example-project❱ python -m example_project.main
example_project.__init__.py
main.py
module.__init__.py
submodule.py
bar()

it's even more apparent if you run pydeps directly on the main.py file:

srv/tmp/../example-project❱ pydeps example_project\main.py -T png  

which gives image

Pydeps (must) traverse the modules on the path to check for other import statements, so the graph is technically correct, but I can see how the resulting graph is not maximally useful.

I'm not sure I know how to change pydeps to prune the graph correctly, but there should be enough information in the bytecode

>>> def fn():
...     from example_project.module.submodule import bar
...     bar()
...
>>> dis.dis(fn)
  1           0 RESUME                   0

  2           2 LOAD_CONST               1 (0)
              4 LOAD_CONST               2 (('bar',))
              6 IMPORT_NAME              0 (example_project.module.submodule)
              8 IMPORT_FROM              1 (bar)
             10 STORE_FAST               0 (bar)
             12 POP_TOP

I'm always happy to merge a PR that adds this functionality...