Group all submodules into its module

gustavonmartins commented 3 years ago

Background: I have too many submodules and this makes difficult to have a quick overview of the diagram.

I would like to be able of foldind/grouping submodules so that I dont get overloaded of information. I need that a module and all of its children be treated the module itself. This would allow me to have a more coarse view, instead of overwhelming fine-grainess.

For instance, in the following diagram:

I would like that app.routers, app.routers.tama_job, app.routers.msn, app.routers.program and app.routers.upload_data all become a single app.routers, and that the arrows change correspondingly in a way that if at least one of the submodules had an arrow to another module MODULE or its submodules, the app.routers would also have an arrow to MODULE.

I expected that it would happened because I put this on the .pydeps file:

[pydeps]
max_bacon = 2
no_show = True
show_dot = True
verbose = 0
pylib = False
reverse = True
exclude =
    app.tests
only = 
    app
    app.main
    app.models
    app.repos
    app.routers
    app.schemas
    app.services
    app.storage

I thought that it would not show submodules of, say, app.routers, because otherwise I would have listed then one per one. It would be nice if the graph showed the same fine-grainess or coarse-grainess as described in the onlyfield, or that it could be controlled somehow

Goal:

In general, it should be allowed to group all submodules as its parent module, like this: Before:

After:

Question/Wish

I didnt find a way of doing that. Is it possible currently, and how?
Or this would be a new feature?

Thanks!

thebjorn commented 3 years ago

This would be a new feature. PRs are very welcome ;-)

machow commented 3 years ago

@thebjorn can you say a bit about how this could be implemented?

I'm looking a bit at the structure of pydeps <mod_name> --show-raw-deps, and it seems like what could happen is...

"collapse" that data down, so each entry is just a "module"
feed that new data into whatever generates a plot

Does that sound right? If you mention some of the relevant functions / modules involved, I'm willing to tinker with things..!

Example output of --show-raw-deps:

``` { "__main__": { "bacon": 0, "imports": [ "test_mod", "test_mod.a", "test_mod.b", "test_mod.b.c" ], "name": "__main__", "path": null }, "test_mod": { "bacon": 1, "imported_by": [ "__main__", "test_mod", "test_mod.a", "test_mod.b" ], "imports": [ "test_mod", "test_mod.a" ], "name": "test_mod", "path": "/Users/machow/Dropbox/Repo/pydeps/tmp/test_mod/__init__.py" }, ... } ```

Edit: it looks like it's passed in to depgraph_to_dotsrc(). Going to take a peek!

machow commented 3 years ago

Alright--so I got something very rough working, but am sure there is a better way.

This script requires an output named types.json, that's the result of running something like...

python -m pydeps.py2depgraph some_script_with_imports.py > types.json

There are 5 parts to the script..

defining a function to rename modules (e.g. a.b.c -> a.b)
converting graph representation in types.json to a new graph
fixing an error where the cli.verbose func doesn't exist
creating a DepGraph object from new graph representation
plotting

# python -m pydeps.py2depgraph script.py > types.json

import json
from collections import defaultdict
from itertools import chain

# 1. Function to do renaming of modules ----

def rename(node_name):
    # shortens a name to only include a single .
    # e.g. a.b.c -> a.b
    return ".".join(node_name.split(".")[:2])

# 2. Convert old output to new one ----

old_depgraph = json.load(open("types.json"))

old_graph = old_depgraph["depgraph"]
new_graph = defaultdict(lambda: {})

new_depgraph = {
    "types": old_depgraph["types"],
    "depgraph": new_graph
    }

all_old_nodes = chain(old_graph.keys(), *old_graph.values())
old_to_new_names = {k: rename(k) for k in all_old_nodes}
uniq_new_names = set(old_to_new_names.values())

for k, entries in old_graph.items():
    new_entries = new_graph[old_to_new_names[k]]

    for old_node, old_path in entries.items():
        new_entries[old_to_new_names[old_node]] = old_path

# 3. Fix an error where making a DepGraph tries to use cli.verbose, ------
# but it doesn't exist (unless you call via the CLI)

from pydeps import cli

cli.verbose = cli._mkverbose(1)

# 4. Create a DepGraph for the new graph -------

from pydeps.depgraph import DepGraph
import json

# TODO: args that need to be passed
# not sure how to get these, since they seem tied to the CLI
kwargs = {
        "show_cycles": False,
        "max_bacon": 2,
        "show_raw_deps": False,
        "show_deps": False,
        "exclude": [],
        "exclude_exact": [],
        "dummyname": None,
        "noise_level": 200,
        "display": None,
        }
types = json.load(open("types.json"))["types"]

dg = DepGraph(new_graph, new_depgraph["types"], **kwargs)
#DepGraph(old_graph, old_depgraph["types"], **kwargs)

# 5. Plot ----

from pydeps.pydeps import depgraph_to_dotsrc
from pydeps import dot

dotsrc = depgraph_to_dotsrc("deps.dot", dg, **kwargs)
svg = dot.call_graphviz_dot(dotsrc, "svg")

with open("out.svg", "wb") as f:
    f.write(svg)

dot.display_svg(kwargs, "out.svg")

Here's output being run on a library called siuba, which has a bunch of submodules. E.g. siuba.dply.verbs imports are consolidated into siuba.dply.

thebjorn commented 3 years ago

interesting. I'll take a deeper look at it later in the week when I get some free time :-)

dzieciou commented 2 years ago

Any progress on that?

bunny-therapist commented 2 years ago

I am also interested in this.

sminozhenko commented 2 years ago

Also, interested

thebjorn commented 2 years ago

Sorry that this took a little while. Could you test the (undocumented) --max-module-depth flag in v1.10.19 available on PyPI?

pydeps --max-module-depth=2 packagename

It should work with the --cluster flag, but will possibly/probably mess with the --max-bacon flag and the --min/max-cluster-size flags.

gustavonmartins commented 2 years ago

Hi, thanks for your effort! I never imagined someone would implement it :).

I will test it when I can.

I saw your source code and had an inspiration which might help with this in the future:

I believe this capability could be implemented by post processing the .dot file generated, add it is not limited to python. So, maybe the original dot file could be parsed (with pydot or py graph viz) and the name merging could be done there to generate a second find, thus sparring you from having to couple this code with your original code.

In this case, your code would read the comments like options that you named max module depth and the parser would call the post processor.

I hope this can make your life easier :)

thebjorn commented 2 years ago

@namoscagnm it's a reasonable idea, but considerable effort is made to get the analysis into a DepGraph instance and for it to be easy to work with, so it's better to keep all graph manipulations there. To not be consistent I do realize that the cluster code is implemented in the RenderBuffer class :-D (it needs support for subgraph dot elements...)

sminozhenko commented 2 years ago

Hi @thebjorn thank you for the change. At least for me, it works like a charm :fire:

bunny-therapist commented 2 years ago

For me as well. Love it.

thebjorn / pydeps

Group all submodules into its module #81