Open edublancas opened 2 years ago
Would like to take on this challenge. I'm already working on a solution & considering 1st creating a basic implementation with different approaches (parso, plain ast & LibCST) to gain a better perspective of the posible solutions. Could you assign it to me?
Feel free to open a PR!
Currently, we do not support using global variables inside a function's body:
The above will break since
sum
is using a variable that's defined outside the function's body. If a user tries to refactor a notebook with code like this usingsoorgeon refactor
, they'll get an error message asking them to change the code to:We throw an error message a link to this document.
However, we should automate this process and modify the user's source code on their behalf so they don't have to do it manually.
considerations
There are a few edge cases to take into account, for example, what if the function already has an argument with the name? Or what if the function's signature is using
*args
, or**kwargs
. Since there are many edge cases, we should focus on covering the simple ones to ensure it works, detect a few of the edge ones, and throw an error so the user fixes it manually.modifying users' code
soorgeon refactor
parses the code into the AST to detect dependencies among notebook's. Python's standard library has an ast module; however, it's very limited so we use parso instead.Parso offers roundtrip conversion, meaning we can go from source code to AST and to source code again. You can see an example of that here:
https://github.com/ploomber/soorgeon/blob/03229806b905e7715d1c3ee0413ff5a3bc30b71c/src/soorgeon/io.py#L851
The above is a function that removes import statements from a string with source code.
determining if parso is the best option
so far, parso has worked well for us; however, we are interested in exploring other options. We're interested in improving soorgeon's capabilities to automatically refactor code so we should ensure we're using the right tool for the job. so part of fixing this issue is to see if there are better alternatives.
the bottom of Python's AST module links to some alternatives:
See also
Green Tree Snakes, an external documentation resource, has good details on working with Python ASTs.
ASTTokens annotates Python ASTs with the positions of tokens and text in the source code that generated them. This is helpful for tools that make source code transformations.
leoAst.py unifies the token-based and parse-tree-based views of python programs by inserting two-way links between tokens and ast nodes.
LibCST parses code as a Concrete Syntax Tree that looks like an ast tree and keeps all formatting details. It’s useful for building automated refactoring (codemod) applications and linters.
Parso is a Python parser that supports error recovery and round-trip parsing for different Python versions (in multiple Python versions). Parso is also able to list multiple syntax errors in your python file.
And I also found this other project:
https://github.com/PyCQA/redbaron https://github.com/PyCQA/baron