thebjorn / pydeps

Python Module Dependency graphs
https://pydeps.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
1.8k stars 114 forks source link

Duplicate source entries got generated that caused incorrect call graph relationship. #174

Open happyhebaby opened 1 year ago

happyhebaby commented 1 year ago

Greetings again.

I have a test project that aim to unit test the tool integration, test and source files all live in one directory, a simplified version: inheritance_copy: init.py animal.py dog.py test_dog.py

source code

animal.py:
class Animal:
    def __init__(self, name, species):
        self.name = name
        self.species = species

    def make_sound(self):
        return "The animal makes a sound."

dog.py:
import animal

class Dog(animal.Animal):
    def __init__(self, name, breed):
        super().__init__(name, "Canis lupus familiaris")
        self.breed = breed

    def make_sound(self):
        return "Woof!"

test_dog.py:
import dog

def test_dog():
    a_dog = dog.Dog("Max", "Labrador Retriever")
    assert a_dog.make_sound() == "Woof!"

When I run cmd under the root inheritance_copy:

pydeps --show-deps --no-output --deps-output b.json --include-missing .

It generated the following json:

{
    "__main__": {
        "bacon": 0,
        "imports": [
            "inheritance_copy",
            "inheritance_copy.animal",
            "inheritance_copy.dog",
            "inheritance_copy.test_dog"
        ],
        "name": "__main__",
        "path": null
    },
    "animal": {
        "bacon": 2,
        "imported_by": [
            "inheritance_copy.dog"
        ],
        "name": "animal",
        "path": null
    },
    "dog": {
        "bacon": 2,
        "imported_by": [
            "inheritance_copy.test_dog"
        ],
        "name": "dog",
        "path": null
    },
    "inheritance_copy": {
        "bacon": 1,
        "imported_by": [
            "__main__"
        ],
        "name": "inheritance_copy",
        "path": "<mylocalpath>/inheritance_copy/__init__.py"
    },
    "inheritance_copy.animal": {
        "bacon": 1,
        "imported_by": [
            "__main__"
        ],
        "name": "inheritance_copy.animal",
        "path": "<mylocalpath>inheritance_copy/animal.py"
    },
    "inheritance_copy.dog": {
        "bacon": 1,
        "imported_by": [
            "__main__"
        ],
        "imports": [
            "animal"
        ],
        "name": "inheritance_copy.dog",
        "path": "<mylocalpath>/inheritance_copy/dog.py"
    },
    "inheritance_copy.test_dog": {
        "bacon": 1,
        "imported_by": [
            "__main__"
        ],
        "imports": [
            "dog"
        ],
        "name": "inheritance_copy.test_dog",
        "path": "<mylocalpath>/inheritance_copy/test_dog.py"
    }
}

It seems to me the above relationship has two issues: It generated two duplicate entry pairs such as dog and inheritance_copy.dog; More seriously it referred the wrong entry(from a pair) in the relationship, one example: inheritance_copy.test_dog imports dog, which has a null path, it should imports 'inheritance_copy.dog, which has the correct source path. BTW if I move all the test files to subdirtest` then the duplicate pairs will not be generated and the call graph is correct.

happyhebaby commented 1 year ago

I tested further, even in a case when I have the above structure where non-test source lives in root, test source lives in test sub folder, it reproed the same bug.

happyhebaby commented 1 year ago

@thebjorn Just wonder if you have time to look at this issue? It brings wrong result to our processing.

Also another slightly variation of this issue: If I run the following cmd for this dir: https://github.com/pytest-dev/pytest/tree/main/src

pydeps --show-deps --no-output --include-missing -LERROR .

It generate two different entries with different import info but pointing to the same file, this also brought incorrect result to our processing:


First entry:
"src._pytest._io": {
    "bacon": 1,
    "imported_by": [
      "__main__"
    ],
    "imports": [
      "src._pytest._io.terminalwriter"
    ],
    "name": "src._pytest._io",
    "path": "<mylocalpath>/pytest/src/_pytest/_io/__init__.py"
  }

Second entry:
  "_pytest._io": {
    "bacon": 1,
    "imported_by": [
      "__main__",
      "_pytest._code.code",
      "_pytest.assertion.rewrite",
      "_pytest.assertion.util",
      "_pytest.cacheprovider",
      "_pytest.compat",
      "_pytest.config",
      "_pytest.config.argparsing",
      "_pytest.doctest",
      "_pytest.fixtures",
      "_pytest.logging",
      "_pytest.python",
      "_pytest.reports",
      "_pytest.setuponly",
      "_pytest.terminal"
    ],
    "imports": [
      "_pytest._io.terminalwriter"
    ],
    "name": "_pytest._io",
    "path": "<mylocalpath>/pytest/src/_pytest/_io/__init__.py"
  }
thebjorn commented 1 year ago

Hi @happyhebaby ! I'm on Easter vacation right now so I don't have much bandwidth (physically and mentally ;-) )

I see the files have different names, _pytest._io and src._pytest._io, and looking at the directory I can understand why (src is not a "code-root").

You'll probably get what you want with

pydeps --show-deps --no-output --include-missing -LERROR pytest

when standing in the src directory.

I can see if there is a way to merge nodes based on path, but I'm not sure I'll get to it until next week...

happyhebaby commented 1 year ago

@thebjorn I got the above result when running from src directory. Preferably if the tool could generate complete call graph recursively for all current dir and sub dir by default, but it doesn't do so at present. so we have to process all folders that have source files individually and then merge the result.