ploomber / soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
https://ploomber.io
Apache License 2.0
78 stars 20 forks source link

dealing with global variables #65

Open edublancas opened 2 years ago

edublancas commented 2 years ago

Currently, we do not support using global variables inside a function's body:

x = 1

def sum(y):
  return x + y

sum(y)

The above will break since sum is using a variable that's defined outside the function's body. If a user tries to refactor a notebook with code like this using soorgeon refactor, they'll get an error message asking them to change the code to:

x = 1

def sum(y, x):
  return x + y

sum(y, x)

We throw an error message a link to this document.

However, we should automate this process and modify the user's source code on their behalf so they don't have to do it manually.

considerations

There are a few edge cases to take into account, for example, what if the function already has an argument with the name? Or what if the function's signature is using *args, or **kwargs. Since there are many edge cases, we should focus on covering the simple ones to ensure it works, detect a few of the edge ones, and throw an error so the user fixes it manually.

modifying users' code

soorgeon refactor parses the code into the AST to detect dependencies among notebook's. Python's standard library has an ast module; however, it's very limited so we use parso instead.

Parso offers roundtrip conversion, meaning we can go from source code to AST and to source code again. You can see an example of that here:

https://github.com/ploomber/soorgeon/blob/03229806b905e7715d1c3ee0413ff5a3bc30b71c/src/soorgeon/io.py#L851

The above is a function that removes import statements from a string with source code.

determining if parso is the best option

so far, parso has worked well for us; however, we are interested in exploring other options. We're interested in improving soorgeon's capabilities to automatically refactor code so we should ensure we're using the right tool for the job. so part of fixing this issue is to see if there are better alternatives.

the bottom of Python's AST module links to some alternatives:


See also

Green Tree Snakes, an external documentation resource, has good details on working with Python ASTs.

ASTTokens annotates Python ASTs with the positions of tokens and text in the source code that generated them. This is helpful for tools that make source code transformations.

leoAst.py unifies the token-based and parse-tree-based views of python programs by inserting two-way links between tokens and ast nodes.

LibCST parses code as a Concrete Syntax Tree that looks like an ast tree and keeps all formatting details. It’s useful for building automated refactoring (codemod) applications and linters.

Parso is a Python parser that supports error recovery and round-trip parsing for different Python versions (in multiple Python versions). Parso is also able to list multiple syntax errors in your python file.


And I also found this other project:

https://github.com/PyCQA/redbaron https://github.com/PyCQA/baron

rrhg commented 2 years ago

Would like to take on this challenge. I'm already working on a solution & considering 1st creating a basic implementation with different approaches (parso, plain ast & LibCST) to gain a better perspective of the posible solutions. Could you assign it to me?

idomic commented 2 years ago

Feel free to open a PR!