ploomber / soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
https://ploomber.io
Apache License 2.0
78 stars 20 forks source link

soorgeon clean #50

Closed edublancas closed 2 years ago

edublancas commented 2 years ago

we should add a:

soorgeon clean {task-name}

command that cleans up notebooks. The most basic version can just apply isort, and flake8.

A more sophisticated version can use the static analysis code that we already have to delete dead code, see https://github.com/ploomber/soorgeon/issues/49

94rain commented 2 years ago

In cli.py, I will add the clean command with a similar structure as refactor with argument task-name and option --deep (leaving this as TODO)

I will start to complete the implementation, but any early feedback is appreciated.

idomic commented 2 years ago

Yeah I think overall it's a good direction, we'll eventually want to call this function via refactor as well. yapf might be too much, but we can do flake8 (running it like that: out = subprocess.run(["flake8"]) and isort has a python api: https://pycqa.github.io/isort/docs/quick_start/3.-api.html

Also I think we can build those basics generic enough so it'll be easy to onboard another package if we want it to be more robust.

edublancas commented 2 years ago

my original comment was inaccurate, flake8 only lints files, it does not fix them. let's use black (https://github.com/psf/black) as it's become the standard and many packages install it, so there's a good chance a user already has it. If black has a Python API, let's use it, if not, let's use subprocess

the main challenge here is cleaning up notebooks (ipynb files). applying isort and black to py files is trivial, but it's not for ipynb files (which we should also support). I think the best way to do it is to load the ipynb file, convert it to py file (we can use jupytext for this), apply isort and flake8, then save back as ipynb

we should add isort and black as dependencies to setup.py

94rain commented 2 years ago

Black does not have a Python API, which will require us to generate intermediate .py files for notebooks.

I think it may be possible that users have run both soorgeon refactor nb.ipynb and soorgeon refactor nb.py. So I will clean both if there are both {task_name}.py/.ipynb.

To avoid overwriting .py files refactored from nb.py, I will use tempfile for intermediate .py files used for cleaning notebooks.

edublancas commented 2 years ago

sounds good