tilboerner / depx

Examine and visualize dependencies used by Python modules 🔍
Other
19 stars 3 forks source link

Clean up dependency data keys #20

Open tilboerner opened 5 years ago

tilboerner commented 5 years ago

The internal representation of a dependency contains more keys than we really need.

wagnerpeer commented 4 years ago

I would love to see al the keys available in an interactive (HTML?) visualization used as selectors/filters. Based on those selectors/filters, modules could be colored differently or highlighted in other ways.

Category could be split threefold:

  1. Local
  2. Builtins
  3. 3rd Party

Where the definition of the above categories is as follows:

  1. Local is everything within the given path
  2. Builtins is "every battery included"
  3. Everything not covered by 1. or 2. This might lead to discussions about the "locality" of self written code referenced, but not inside the given path. But from the inside view of the project under inspection, this is not distinguishable from any other 3rd party module or library.

At last a few shallow use cases for all the discussed keys: Use Case 1: I would like to see the usage of relative imports. Modules next to each other, doing a lot of relative imports point out a package ready for extraction. Use Case 2: A relative import level higher than X, might be unwanted. Use Case 3: I want to distinguish between local imports, builtins and other 3rd party modules/libraries. To do so, the category can be used.

tilboerner commented 4 years ago

Note that the category values are currently "local" and "module", which differentiates if the import happened inside a function or at the module level (compile time). I don't see that as a good use for the category key, and would rather stop that.

I agree that it would be useful to categorize dependencies by "builtin", "3rd party" and "1st party", like isort does for example. I think that distinction can be tricky in practice and would need some exploration. Anyway, that would be a good use for the category key, and removing the current misuse (see above) a step in the right direction.

As for the other keys (level and is_relative) - these relate to dependencies through imports. Dependencies can also exist more implicitly, and might not have meaningful values for these keys. I'd like to keep a way open so those can also be represented. If you'd like to keep this data, what do you think about adding a nested mapping under a key "meta" or "import"?

{
  "from_module": "depx.graph",
  "to_module": "io",
  "from_name": "io",
  "to_name": "io",
  "import": {
    "is_local": false,
    "is_relative": false,
    "level": 0,
  }
}

or

  "meta": {
    "type": "import",
    "is_local": false,
    "is_relative": false,
    "level": 0,
  }
wagnerpeer commented 4 years ago

Hey. The idea of a nested mapping is interesting. Because the idea can be generalized and used for different structures / purposes. However, I don't think that neither "meta" nor "import" are a good fit for a name.

This makes me think about the purpose of the structure to get to a better fitting name: The most relevant information in the mapping are the two keys "from_module" and "to_module", as those can be used to build a basic dependency graph. Next to the basic information are static, factual information like "is_local", "is_relative" or "level". They can be used to highlight, rename or do other things with nodes. Not included here are dynamic information, which can be computed on the complete basic and factual information like statistics: How often a module gets imported, etc. This information gets interesting to enhance the edges of the graph, like thickness of lines or similar attributes.

Therefore, what about the following names and structure:

{
  "from_module": "depx.graph",
  "to_module": "io",
  "static": {
    "from_name": "io",
    "to_name": "io",
    "is_local": false,
    "is_relative": false,
    "level": 0,
  },
  "dynamic": {
    "count": 10
  }
}

Alternate names, split structure: