ploomber / soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
https://ploomber.io
Apache License 2.0
78 stars 20 forks source link

apply black before refactoring #84

Closed edublancas closed 1 year ago

edublancas commented 2 years ago

I ran soorgeon refactor on this notebook: https://www.kaggle.com/code/cdeotte/xgboost-starter-0-793

after fixing the headings and the function that uses global variables. I got this error:

  File "/Users/Edu/dev/soorgeon/src/soorgeon/io.py", line 515, in find_inputs_and_outputs_from_leaf
    (_, candidates_in, candidates_out) = find_for_loop_def_and_io(
  File "/Users/Edu/dev/soorgeon/src/soorgeon/io.py", line 104, in find_for_loop_def_and_io
    raise ValueError(f'Expected a node with type "for_stmt", '
ValueError: Expected a node with type "for_stmt", got: <ExprStmt: df = df.merge(importances[k], on='feature', how='left')@4,25> with type expr_stmt
Error: An error occurred when refactoring notebook.

and it failed. I realized it's because of this for loop:

for k in range(1,FOLDS): df = df.merge(importances[k], on='feature', how='left')

I realized that applying black and rerunning it fixed the issue. that's an easy fix for many of this edge cases; so I think we should apply black on the notebook before running soorgeon refactor

here's the notebook. note that I changed the extension from ipynb to txt since github wont let me upload it with ipynb extension

xgboost-starter-0-793.txt

rrhg commented 2 years ago

Hi, is Rod. Made this change & apparently is working, but would like your input. Basically, clean_mudule.basic_clean(nb_path) already implements this logic of applying black, so now is also run before export.refactor(...) (in cli.py). But I could be missing something.

edublancas commented 2 years ago

please open a PR with your changes, that will help me evaluate the solution

rrhg commented 2 years ago

Fixed unit-tests by adding option to disable string normalizations (aka do not mess with single quotes) when applying black. The reason is that many tests were failing because they use regex to match single quotes.

rrhg commented 2 years ago

Submitted a PR to ploomber/ci-notebooks that should fix the integration-tests on Soorgeon. The reason is that those notebooks are downloaded by the Soorgeon integration-tests workflow & converted to dags. Those convertions are failing because of errors in code in those notebooks.

edublancas commented 2 years ago

thanks for the patience! can you push a new commit to re-trigger the CI for your PR? https://github.com/ploomber/soorgeon/pull/87

git commit --allow-empty -m "trigger ci"
rrhg commented 2 years ago

Done. Tests passed!

edublancas commented 2 years ago

apologies for the delay @rrhg; I'll review this in the next 1-2 days. thanks for your patience and for your contributions!

rrhg commented 2 years ago

Thanks. Take your time. Is it posible to assign issue 65 to me ?