Closed 94rain closed 2 years ago
Do we still want to keep the following, or just move forward with only the filepath?
soorgeon clean {task-name}
just filepath
click has some utilities to validate them: https://click.palletsprojects.com/en/8.1.x/arguments/#file-path-arguments
I pushed the changes but the integration tests are failing (Kaggle Unauthorized error). I am looking into the issue.
that's ok, those are failing because you're missing an API key, no need to do anything. I'll review your code later
hi i found an issue: when cleaning a ipynb file, the notebook's metadata gets lost.
to reproduce:
ploomber examples -n templates/ml-basic -o ml
cd ml
pip install requirements.txt
# reformat fit.py -> fit.ipynb
ploomber nb -f ipynb
# check that the pipeline runs
ploomber build -f
# clean
soorgeon clean fit.ipynb
# now it's broken
ploomber build -f
Error: Notebook does not contain kernelspec metadata and kernelspec_name was not specified, either add kernelspec info to your source file or specify a kernelspec by name. To see list of installed kernels run "jupyter kernelspec list" in the terminal (first column indicates the name). Python is usually named "python3", R usually "ir"
ipynb files store metadata that tells jupyter which kernel to use, looks like that data is not preserved. I think jupytext has some settings for this, so we need to ensure that when reading and writing the file we're keeping the metadata
please also add a test case: compare the original file's metadata with the metadata in the clean version and ensure they match
Based on the documentation,
notebook_metadata_filter
: By default, Jupytext only exports thekernelspec
andjupytext
metadata to the text files.
Not sure why it is not preserved. Looking into it.
I was not able to run the given example. (my ploomber version is 0.19.5.dev0).
Instead, I ran the example in examples/machine-learning. The kernelspec info is indeed still there. For clean.ipynb, when I convert it to py file, looks like isort
is sorting the files with changes to the position of tags:
We need to find a way to prevent isort
from doing this or just get rid of isort
.
I was not able to run the given example. (my ploomber version is 0.19.5.dev0).
interesting, can you share the steps you followed? I'll try to reproduce it
We need to find a way to prevent
isort
from doing this or just get rid ofisort
.
good catch. moving that tag isn't really an issue but others are important, can you run a quick test and tell me what happens with the parameters tag. in any example, you'll see something like this:
# %% tags=["parameters"]
upstream = None
product = None
add an import statement below the tag:
import math
# %% tags=["parameters"]
import pandas as pd
upstream = None
product = None
then apply the clean, does the import move with the tag? I'm guessing it'll yield something like:
import math
# %% tags=["parameters"]
import pandas as pd
# ...other imports?
upstream = None
product = None
The steps I followed (on Windows):
conda activate soorgeon
ploomber examples -n templates/ml-basic -o ml
cd ml
pip install -r requirements.txt
ploomber nb -f ipynb
ploomber build -f
Then it gives me the error message:
For the quick test, yes your guess is correct.
great work!