thebjorn / pydeps

Python Module Dependency graphs
https://pydeps.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
1.8k stars 114 forks source link

Support for editable packages / direct_url.json #222

Closed micdet-delen closed 6 months ago

micdet-delen commented 6 months ago

Hello,

Does pydeps currently support editable packages (using hint in direct_url.json in the .dist-info directory) ?

See also this related mypy issue that explains it much better: https://github.com/python/mypy/issues/12313

I'm currently chasing an issue with packages missing from the dependency graph and I think this is the issue. Would be nice if anyone could confirm it :)

UPDATE: And might there be a way to workaround this issue by modifying the sys.path for pydeps so I can point to the files/packages directly (thus skipping direct_url.json) ? I'v tried providing PYTHONPATH to pydeps, but that doesn't seem to work :/

thebjorn commented 6 months ago

Editable packages (installed with pip install -e ...) should be findable by the python import machinery and also by the stdlib modulefinder module, no..? I'm afraid the linked mypy issue didn't enlighten me as to what the real-world problem would be (it is also closed as wont-fix since the direct_url.json will not help in at least one common use-case.

micdet-delen commented 6 months ago

Yes they are indeed installed with pip install -e.

Hmm, there is something else going on. I see now in --debug-mf 2 output that I'm not only getting ImportErrors on our own packages/modules, but also on standard libraries importing other standard libraries...

Eg (but there are many more, but definitely not on all of them):

import_hook: name(doctest) caller(Module(name=heapq, file='/usr/local/lib/python3.11/heapq.py', path=None)) fromlist(None) level(0) 
    import_module 'doctest' 'doctest' None 
        load_module(PY_SOURCE) fqname=doctest, fp=fp, pathname=/usr/local/lib/python3.11/doctest.py 
        import_hook: name(__future__) caller(Module(name=doctest, file='/usr/local/lib/python3.11/doctest.py', path=None)) fromlist(None) level(0) 
            import_module '__future__' '__future__' None 
                load_module(PY_SOURCE) fqname=__future__, fp=fp, pathname=/usr/local/lib/python3.11/__future__.py 
            load_module -> Module(name=__future__, file='/usr/local/lib/python3.11/__future__.py', path=None) 
        import_module -> Module(name=__future__, file='/usr/local/lib/python3.11/__future__.py', path=None) 
        import_hook: name(difflib) caller(Module(name=doctest, file='/usr/local/lib/python3.11/doctest.py', path=None)) fromlist(None) level(0) 
            import_module 'difflib' 'difflib' None 
                load_module(PY_SOURCE) fqname=difflib, fp=fp, pathname=/usr/local/lib/python3.11/difflib.py 
                import_hook: name(heapq) caller(Module(name=difflib, file='/usr/local/lib/python3.11/difflib.py', path=None)) fromlist(None) level(0) 
                    import_module 'heapq' 'heapq' None 
                import_module -> Module(name=heapq, file='/usr/local/lib/python3.11/heapq.py', path=None) 
                import_hook: name(heapq) caller(Module(name=difflib, file='/usr/local/lib/python3.11/difflib.py', path=None)) fromlist(['nlargest']) level(0) 
                    import_module 'heapq' 'heapq' None 
                import_module -> Module(name=heapq, file='/usr/local/lib/python3.11/heapq.py', path=None) 
                import_hook: name(collections) caller(Module(name=difflib, file='/usr/local/lib/python3.11/difflib.py', path=None)) fromlist(None) level(0) 
                    import_module 'collections' 'collections' None 
                import_module -> Module(name=collections, file='/usr/local/lib/python3.11/collections/__init__.py', path=['/usr/local/lib/python3.11/collections']) 
                import_hook: name(collections) caller(Module(name=difflib, file='/usr/local/lib/python3.11/difflib.py', path=None)) fromlist(['namedtuple']) level(0) 
                    import_module 'collections' 'collections' None 
                import_module -> Module(name=collections, file='/usr/local/lib/python3.11/collections/__init__.py', path=['/usr/local/lib/python3.11/collections']) 
                    import_module 'namedtuple' 'collections.namedtuple' Module(name=collections, file='/usr/local/lib/python3.11/collections/__init__.py', path=['/usr/local/lib/python3.11/collections']) 
                import_module -> None 
                ImportError: 'No module named collections.namedtuple' 

I'm running this from within a venv, but that really shouldn't matter right?

sys.path looks like this (from where I'm running pydeps):

Python 3.11.4 (main, Jun  7 2023, 18:32:58) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/local/lib/python311.zip', '/usr/local/lib/python3.11', '/usr/local/lib/python3.11/lib-dynload', '<redacted>/.venv/lib/python3.11/site-packages']
>>> 

Any ideas? I'll dig further if I have to, but maybe you can give me some pointers/ideas and save me some time :D

thebjorn commented 6 months ago

collections.namedtuple is not a module, it is a factory function in collections.__init__.py...

micdet-delen commented 6 months ago

Aah so it just stops looking any deeper then. So this is a legit ImportError 'No module named collections.namedtuple'.

But still... there is a dependency on collections none the less? Why does it not include collections then in the dependency graph?

I've reduced it to a bare bones example:

testpackage
├── __init__.py
└── justimportcollectionsnamedtuple.py
$ cat testpackage/justimportcollectionsnamedtuple.py 
from collections import namedtuple

Just running pydeps testpackage gives an empty svg:

$ pydeps testpackage/
$ cat testpackage.svg 
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
 "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.43.0 (0)
 -->
<!-- Title: G Pages: 1 -->
<svg width="8pt" height="8pt"
 viewBox="0.00 0.00 8.00 8.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 4)">
<title>G</title><style>.edge>path:hover{stroke-width:8}</style>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-4 4,-4 4,4 -4,4"/>
</g>
</svg>

Running with --include-missing DOES show the actual dependency on collections:

$ pydeps --include-missing testpackage/
$ cat testpackage.svg 
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
 "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.43.0 (0)
 -->
<!-- Title: G Pages: 1 -->
<svg width="260pt" height="166pt"
 viewBox="0.00 0.00 259.73 165.85" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 161.85)">
<title>G</title><style>.edge>path:hover{stroke-width:8}</style>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-161.85 255.73,-161.85 255.73,4 -4,4"/>
<!-- collections_namedtuple -->
<g id="node1" class="node">
<title>collections_namedtuple</title><style>.edge>path:hover{stroke-width:8}</style>
<ellipse fill="#b65353" stroke="black" cx="125.87" cy="-136.64" rx="53.07" ry="21.43"/>
<text text-anchor="middle" x="125.87" y="-139.64" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#ffffff">collections.</text>
<text text-anchor="middle" x="125.87" y="-128.64" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#ffffff">namedtuple</text>
</g>
<!-- testpackage_justimportcollectionsnamedtuple -->
<g id="node2" class="node">
<title>testpackage_justimportcollectionsnamedtuple</title><style>.edge>path:hover{stroke-width:8}</style>
<ellipse fill="#4cb3b3" stroke="black" cx="125.87" cy="-21.21" rx="125.73" ry="21.43"/>
<text text-anchor="middle" x="125.87" y="-24.21" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">testpackage.</text>
<text text-anchor="middle" x="125.87" y="-13.21" font-family="Helvetica,sans-Serif" font-size="10.00" fill="#000000">justimportcollectionsnamedtuple</text>
</g>
<!-- collections_namedtuple&#45;&gt;testpackage_justimportcollectionsnamedtuple -->
<g id="edge1" class="edge">
<title>collections_namedtuple&#45;&gt;testpackage_justimportcollectionsnamedtuple</title><style>.edge>path:hover{stroke-width:8}</style>
<path fill="none" stroke="black" d="M125.87,-115.36C125.87,-97.97 125.87,-72.65 125.87,-52.8"/>
<polygon fill="#b65353" stroke="black" points="129.37,-52.7 125.87,-42.7 122.37,-52.7 129.37,-52.7"/>
</g>
</g>
</svg>

And I realize now that our own custom packages that are missing from the graph have the same issue. They only do "from custompackage import SomeClass" and thus pydeps does not look any deeper and does not include custompackage?

Is it that simple? Is this a bug or am I missing something?

micdet-delen commented 6 months ago

If I add the following line to justimportcollectionsnamedtuple.py

from git import Repo

git DOES show up as a dependency...

I did some poking around the pydeps code with IPython.embed(). Specifically py2degraph.py > MyModuleFinder > ensure_fromlist (which throws ImportError "No module named..." ). But it's a bit over my head.

All I know is that both from collections import namedtuple AND from git import Repo get an ImportError (ImportError: 'No module named git.Repo')

But git package DOES get included in the dep graph and collections does not... Ok, NEVERMIND: collections seems to get fixed when using --pylib options..., obviously...

But what about our own custom package then... Why is it still not included...?

I've ran

python -m modulefinder testpackage/justimportcollectionsnamedtuple.py

And it lists our custompackage under "Missing modules:"

? custompackage imported from __main___

But it isn't missing, I can perfectly run that "from custompackage import SomeClass " line from the Python interactive interpreter.

micdet-delen commented 6 months ago

I've uninstalled the custom editable package and installed it normally (so all files are fysically copied now to site-packages).

And now I get errors when running python -m modulefinder testpackage/justimportcollectionsnamedtuple.py

massive stacktrace before this
  File "/usr/local/lib/python3.11/modulefinder.py", line 308, in import_module
    fp, pathname, stuff = self.find_module(partname,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/modulefinder.py", line 489, in find_module
    return _find_module(name, path)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/modulefinder.py", line 69, in _find_module
    if spec.loader.is_package(name):
       ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'is_package'

python testpackage/justimportcollectionsnamedtuple.py runs perfectly however...

So it seems something is wrong there. I'll google around for any related issues with modulefinder.

I welcome any further input or ideas anyone can give me, otherwise this issue can be closed I think.

micdet-delen commented 6 months ago

I've tried it with the same custom package on another system and it works...

Only difference seems to be the Python version. It doesn't work with 3.11.4, it DOES work with 3.112.

I'm gonna see if i can down (or up)grade the Python version of the first one. See if it makes a difference.

micdet-delen commented 6 months ago

I've downgraded to Python 3.9.2 in my original environment and python -m modulefinder testpackage/justimportcollectionsnamedtuple.py just works.

And pydeps, consequently, also works. It now shows our custom package.

I have only been able to find this related issue, so it seems there is some kind of regression in modulefinder? https://github.com/python/cpython/issues/84530

But our custompackage does not use namespaces (there is an init.py).

I'll see if upgrading to a newer Python version also fixes it. But I'm on Debian, so not always easy.

Anyway: it does not seem to be a problem with Pydeps!

micdet-delen commented 6 months ago

I've traced the issue back to the preinstalled Python version on the Microsoft provider Docker images:

https://github.com/devcontainers/images/tree/main/src/python

On the latest version of the Docker image (3.12-bookworm) modulefinder also doesn't find the package if installed as editable. In this version however modulefinder does work if I install the package as normal. Where on the previous version (3.11-bullseye) it threw that stacktrace.

If I switch to the OS (Debian) provided Python version in the image it all works.

I'm going to report this issue on the devcontainers project.

Thank you for your time and your awesome project! I've managed to get what I want now by using this latest Docker image and temporarily installing our custompackage as normal instead of editable.

thebjorn commented 6 months ago

Good detective work :-)