lucsorel / py2puml

Generate PlantUML class diagrams to document your Python application.
https://pypi.org/project/py2puml/
MIT License
212 stars 34 forks source link

Refactor codebase to use AST #59

Open grayfox88 opened 1 year ago

grayfox88 commented 1 year ago

py2puml uses internally a mix of code inspection and Abstract Syntax Tree parsing. The whole code based should be refactored to use AST parsing only as it is more robust and straightforward.

lucsorel commented 1 year ago

I agree, code inspection hits several walls:

  1. in some code bases, I saw some type annotations referring to types that are not imported "at the top of the files", but imported in try-except blocks that are within constructors. When inspecting, py2puml does not pass through these "dynamic " imports and fails to solve the fully-qualified type name (the type prefixed by the module path)
  2. during code inspection, py2puml fails to distinguish the fields of the current class for the fields of the parent class if any. When resolving the fully-qualified type name of a field, its corresponding import would then be in the module of the parent class and not in the current module. Heuristics can be implemented to compare the fields from a class with the fields of the parent classes to avoid duplicate declarations and parent-import-resolution but it also make the code more complicated
  3. inspecting a module makes the python interpreter execute the file. And if side-effects are not wrapped by a classic if __name__ == '__main__': clause, they will be executed

The last point is the one I fear the most because users of py2puml may be executing some code when generating the documentation, causing unexpected consequences.

lucsorel commented 1 year ago

Some downsides of relying only on AST parsing:

  1. implementing py2puml features will be harder because AST parsing relies on the visitor pattern (see https://docs.python.org/3/library/ast.html#ast.NodeVisitor, https://www.mattlayman.com/blog/2018/decipher-python-ast/), which makes the "visiting" code harder to write, understand and test
  2. fields and methods that are added dynamically to a class (using decorators, for example) will be missing in the generated documentation
grayfox88 commented 12 months ago

Good summary @lucsorel , concerning the downside related to AST being harder to understand, I kind of agree but believe it is a matter of time before one gets used to it. On this matter I would recommend the excellent hands-on documentation Green Tree Snake.

I have been working during the past months on a version of py2puml using solely AST which I will send a pull request for as soon as it gives decent results. It is still on a Proof Of Concept stage and needs some bug fixing.