Closed charlesfrye closed 3 years ago
@lavanyashukla, would love your input on this (incl. just a π) and if you could rope in anyone besides Ayush who's working on the Colabs in this repo.
I just have one thing to say. This is a well put guideline @charlesfrye.
Will just add one pointer:
Code readability: The author of the colab must ensure that the code is readable. This includes:
One more pointer:
Should we put a markdown section mentioning the name of the author and the date of creation of the colab notebook.
This will ensure ease of assigning bugs.
@ayulockin I appreciate the points about code readability! I agree with all of them. Combined with the points about VC, we're starting to get something of a Colab Style Guide going. Might write that in Notion once I've got a few more under my belt.
For authorship -- I agree with you that we need to track that better. But once we've got the Colabs in GH, the authorship will be in the history. I'd rather keep it there, rather than exposing it to readers as a section.
Notion doc on Colab Style guide would be awesome.
I will agree on your take for authorship. π―
[x] Configs in W&B https://colab.research.google.com/drive/1eITPzCy3ExfXfjUHqGo8n2Pp1GHZcgnX?usp=sharing
[x] Simple Keras Integration https://colab.research.google.com/drive/1WZI9C9l8-mzTSNS3mXS2zKzdz7N38jYO?usp=sharing
[x] PyTorch Sweep (+replace w Simple) https://colab.research.google.com/drive/1QTIK23LBuAkdejbrvdP5hwBGyYlyEJpT?usp=sharing
[x] Simple TF Integration https://colab.research.google.com/drive/126c1k5IfbQpE7dVmhnoDTdmFfC7CgJqg?usp=sharing
[x] Weights & Biases with fastai https://colab.research.google.com/drive/1IWrhwcJoncCKHm6VXsNwOr9Yukhz3B49?usp=sharing
[x] Hugging Face + W&B https://colab.research.google.com/drive/1NEiqNPhiouu2pPwDAVeFoN4-vTYMz9F8?usp=sharing
[x] TensorFlow Sweep https://colab.research.google.com/drive/1tPBiBzl55FcePzPS26UNpxdmIdNt9C-5?usp=sharing
[x] XGBoost Sweep https://colab.research.google.com/drive/1aJf2DEobaXCcdv-Ys4sV53bEgkh6_auL?usp=sharing
[x] Simple LightGBM Integration https://colab.research.google.com/drive/1ybowtxi9LkApZEIXryhRrrhbvDrUsFy4?usp=sharing
[x] Simple Scikit Integration https://colab.research.google.com/drive/1dxWV5uulLOQvMoBBaJy2dZ3ZONr4Mqlo?usp=sharing
Great to see so much activities going on in this repo - cant wait to see the outcome when these PRs are all merged
Notion doc on Colab Style guide would be awesome.
I will agree on your take for authorship. π―
Whats this Notion Doc you two have been talking about, I must be missing all the nice notebook/colab/kernel goodies
Glad you're as excited as I am, Mani!
Notion is a collaborative document-editing platform. We use it internally for lots of things, including style guides for authors of W&B-related content.
If you need any help with Feature Importances created using W&B and a few of the models types/frameworks, please tag me along. I have created a few PRs on the wandb/client repo for this and also linked them to the respective notebooks.
Let me know if you like me to share these and if they would add value to your current work.
The core set of colabs has now been integrated (π), so I'm going to close this issue and open issues for specific colabs that need to be added or edited.
Rationale
The Colabs are an important part of the purpose of this repo: allowing users to get a sense for how wandb works in their use case.
For that reason, we should incorporate them and control their versions. Not least so that I can stop bugging @ayulockin with issues and start bugging him with PRs.
I will open a PR shortly that demonstrates how this would work for a single Colab, but as this is a big change, I wanted to track it with an issue as well.
Process
From scratch, the process operates as follows:
Copy_of_
prefix). The path should begincolabs/{identifier/}?
, where the{identifer}
directories help organize the Colabs, as in the examples directory. The?
indicates that one or more identifiers can be used. For Colabs, it's probably only necessary to have one, since each example only has a single file (the notebook), whereas theexamples
tend to have entire folders.For changes to existing notebooks, we enter at step 2, by clicking the badge: .
Best Practices for Version Control in Notebooks
I've VC'd two large repos of notebooks: one well and one poorly. Here's what I've learned in the process:
%%capture
on installation cells. Present media as screenshots in Markdown, rather than using IPython to render them, as these are less likely to change. Use the "private outputs" setting if possible.True
.random.set_seed(42); tf.random.set_seed(117)
. GPUs are not always deterministic; see this StackOverflow post.try
andexcept
rather than showing an uncaughtError
.DeprecationWarnings
appear, behavior slightly changes, etc. If you include apip install
step for anything other thanwandb
, always pin the version. Unfortunately, the Colab environment is out of our control, so changes to the environment will happen anyway. Examples:%%capture\n pip install -qq numpy==1.16
.x[::5], y[::5]
) and work with the smallest reasonable dataset.Here's a snippet that can be used to get deterministic behavior in TF:
and one in PyTorch:
Appendix: Dialog Box on Colab