ploomber / soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
https://ploomber.io
Apache License 2.0
78 stars 20 forks source link

refactored code, down load from github instead of kaggle #64

Closed Wxl19980214 closed 2 years ago

Wxl19980214 commented 2 years ago

Describe your changes

  1. Instead of fetching from kaggle, we fetch from github by pygithub api now
  2. Add _pygithub.py for downloading a dir from a repo.
  3. Add pygithub package to setup.py (I don't know if this one is necessary)
  4. Removed stored notebooks under _kaggle since we can store them in the storage repo now
  5. Removed unused functions under src/_kaggle.py
  6. Removed kaggle username and key in CI, I don't think we need them. Maybe we can also remove them from repo secrets
  7. Storage repo here and README.md has instructions on how to upload
  8. Contribute.md remains unchanged, we can let user register new notebook in _kaggle/index.yaml. But the rest of downloading and uploading, updating CI and test cases will need to be done manually by one of the team members.

To do

1.We need to add a personal access token to the repo's secrets.

  1. Right now CI will fail because this one fails test_notebooks[look-at-this-note-feature-engineering-is-easy] for this reason: E soorgeon.exceptions.InputError: Only H1 headings are found. At this time, only H2 headings are supported. Check out our guide:

    Issue ticket number and link

    Closes #59 #39

Checklist before requesting a review

edublancas commented 2 years ago

the CI is broken, looks like some issue accessing the notebooks

Wxl19980214 commented 2 years ago

@edublancas Yeah it is. I put it in the To Do list. We need a PAT in repo's secrets. Also one of the test case is failing anyway

edublancas commented 2 years ago

ok so most of them pass now. but there's one that still fails look-at-this-note-feature-engineering-is-easy. did anything change from the file we had here to the one that's being downloaded? looks like it has H1 headings now

idomic commented 2 years ago

Yeah we're on it, we introduced a feature to reject H1s, we'll change it to H2s in the notebook and that should fix it.

Wxl19980214 commented 2 years ago

Just updated notebooks. The last check should pass. But still, I do think we need to put a token in to make sure it does't fail in the future. I think there's a limitation on api request for non-login request. I am asking shuyang to generate a pat to test if it works. I don't even know why it works without authentication, but it mostly because our repo is public.

idomic commented 2 years ago

lmk and I'll remove the PAT. Yeah I think it's because it's a public repo.

Wxl19980214 commented 2 years ago

@idomic I left them there on purpose just in case we encounter some errors later. But I can remove it for now. It's easy fix anyway