[Section] Efficient Training

training-transformers-together / training-transformers-together.github.io

Contents of the main NeurIPS 2021 demo page

MIT License

2 stars 0 forks source link

[Section] Efficient Training #3

Open justheuristic opened 2 years ago

justheuristic commented 2 years ago

[x] quantization explained (@TimDettmers)
[x] 8-bit-train a large model (@TimDettmers)
[x] training on large streaming dataset (@lhoestq)
[x] compile into a single notebook
[x] calculator
- [ ] integrate with the demo webpage

justheuristic commented 2 years ago

Talked with @TimDettmers

Planned text layout:

8-bit explained SIDE BY SIDE with calculator
streaming SIDEBY SIDE with @lhoestq 's animation
tweak it in a way that if you have a small screen, side-by-side becomes two lines
and then a large badge that leads to the notebook

If we have time, add plots based on the calculator

TimDettmers commented 2 years ago

@mryab could you please review the notebook if more explanations are needed and if the flow through the story is good. Here the current notebook: https://colab.research.google.com/drive/1jeX4Qcq4O_kWxfta9fkXDeZ6NFYoqoxJ?usp=sharing

TimDettmers commented 2 years ago

@lhoestq Here a draft of the efficient training tab. I already added a paragraph on dataset streaming. Please feel free to edit and expand the doc directly. https://docs.google.com/document/d/1RGWYcXM3F4rdwkJZNmPjJjWKi9aOYru1FuKtet1xhdw/edit?usp=sharing

justheuristic commented 2 years ago

@TimDettmers i've modified it to be a tiny bit more memory efficient, take a look (same as in slack) https://colab.research.google.com/drive/1WhcadcfMPzbiLUljlfIKzUhMUrbxIlyX?usp=sharing

justheuristic commented 2 years ago

Quick review:

it appears we are creating the model twice
we're fine-tuning on codeparrot or C4? I'm okay with both as long as we do so deliberately
There's one problem with GPT-2: if you start the notebook from the main training, it runs fine, but if you run notebook from scratch, it sometimes runs out of CPU memory due to pre-existing variables. Do you know any magic that will make notebook run-able from scratch?

Optional

Do you think it would make sense to showcase how it's used in our demo's first experiment with dalle?

If so, here's DALLE with 1B parameters that fits on a k80 with Adam8Bit, but takes 19gb+ with regular Adam https://colab.research.google.com/drive/1b_0KLGOY9Dbbgup-Ln0fGX2TiDc8Y_Ih?usp=sharing

TimDettmers commented 2 years ago

Thanks for the memory fix and catching that bug! Here the most recent notebook: https://colab.research.google.com/drive/1Ii3JRnpI-15qoFhd8lgxXGwUQIiM7u0o?usp=sharing

lhoestq commented 2 years ago

From my message on slack:

You can switch to using the code dataset with
args.dataset_name = "transformersbook/codeparrot-train"
args.dataset_config_name = None
args.text_column_name = "content"
As you want. Personally I realized that there are quite a lot of info already in the notebook, so if switching to the code dataset could make things confusing for users I would just stick to using C4