Clean up dependency data keys

tilboerner commented 5 years ago

The internal representation of a dependency contains more keys than we really need.

from_name and to_name are often the same (unless the import has an as), and we don't really need them.
is_relative and level are also quite irrelevant details of the import.
category we can probably turn into a boolean flag is_local, which is a better way to name and express the information we're after.

wagnerpeer commented 4 years ago

I would love to see al the keys available in an interactive (HTML?) visualization used as selectors/filters. Based on those selectors/filters, modules could be colored differently or highlighted in other ways.

Category could be split threefold:

Local
Builtins
3rd Party

Where the definition of the above categories is as follows:

Local is everything within the given path
Builtins is "every battery included"
Everything not covered by 1. or 2. This might lead to discussions about the "locality" of self written code referenced, but not inside the given path. But from the inside view of the project under inspection, this is not distinguishable from any other 3rd party module or library.

At last a few shallow use cases for all the discussed keys: Use Case 1: I would like to see the usage of relative imports. Modules next to each other, doing a lot of relative imports point out a package ready for extraction. Use Case 2: A relative import level higher than X, might be unwanted. Use Case 3: I want to distinguish between local imports, builtins and other 3rd party modules/libraries. To do so, the category can be used.

tilboerner commented 4 years ago

Note that the category values are currently "local" and "module", which differentiates if the import happened inside a function or at the module level (compile time). I don't see that as a good use for the category key, and would rather stop that.

I agree that it would be useful to categorize dependencies by "builtin", "3rd party" and "1st party", like isort does for example. I think that distinction can be tricky in practice and would need some exploration. Anyway, that would be a good use for the category key, and removing the current misuse (see above) a step in the right direction.

As for the other keys (level and is_relative) - these relate to dependencies through imports. Dependencies can also exist more implicitly, and might not have meaningful values for these keys. I'd like to keep a way open so those can also be represented. If you'd like to keep this data, what do you think about adding a nested mapping under a key "meta" or "import"?

{
  "from_module": "depx.graph",
  "to_module": "io",
  "from_name": "io",
  "to_name": "io",
  "import": {
    "is_local": false,
    "is_relative": false,
    "level": 0,
  }
}

or

  "meta": {
    "type": "import",
    "is_local": false,
    "is_relative": false,
    "level": 0,
  }

wagnerpeer commented 4 years ago

Hey. The idea of a nested mapping is interesting. Because the idea can be generalized and used for different structures / purposes. However, I don't think that neither "meta" nor "import" are a good fit for a name.

This makes me think about the purpose of the structure to get to a better fitting name: The most relevant information in the mapping are the two keys "from_module" and "to_module", as those can be used to build a basic dependency graph. Next to the basic information are static, factual information like "is_local", "is_relative" or "level". They can be used to highlight, rename or do other things with nodes. Not included here are dynamic information, which can be computed on the complete basic and factual information like statistics: How often a module gets imported, etc. This information gets interesting to enhance the edges of the graph, like thickness of lines or similar attributes.

Therefore, what about the following names and structure:

{
  "from_module": "depx.graph",
  "to_module": "io",
  "static": {
    "from_name": "io",
    "to_name": "io",
    "is_local": false,
    "is_relative": false,
    "level": 0,
  },
  "dynamic": {
    "count": 10
  }
}

Alternate names, split structure:

static / dynamic
base / statistics
basics / statistics

tilboerner / depx

Clean up dependency data keys #20