Open felipemello1 opened 2 weeks ago
Hi @felipemello1 is it available for external contributors? if yes kindly assign it to me will look into it. Thanks in advance!
@AnuravModak yes! Any issue with "community help wanted" is something that we think would be a great fit for external contributors. Feel free to ask me questions and just submit a PR.
More info on contributing here, but the TLDR: main/CONTRIBUTING.md
1) Fork torchtune: pytorch/torchtune
then clone from it: git clone github.com
2) install dependencies
cd torchtune conda create -n torchtune python=3.11 conda activate torchtune pip install --pre --upgrade torch torchvision torchao --index-url download.pytorch.org/whl/nightly/cu124 pip install -e ".[dev]" pre-commit install
3) create the PR
Current knowledge distillation recipes don't have support for activation offloading and opt_in_bwd.
The implementation should be similar to the one in other recipes, like full_finetuning_distributed.
After enabling it in the recipe, it should also be enabled in the configs related to KD.
PRs with reference implementation: activation offloading: https://github.com/pytorch/torchtune/pull/1847 opt_in_bwd implementation: https://github.com/pytorch/torchtune/pull/1833
KD recipes: https://github.com/pytorch/torchtune/blob/main/recipes/knowledge_distillation_single_device.py https://github.com/pytorch/torchtune/blob/main/recipes/knowledge_distillation_distributed.py
after implementing it, run it with the flag on/off and plot the graphs of loss/memory/words per second. The easier way is to add the wandb logger to the config.
to update configs in bulk, you can use the script here: https://github.com/pytorch/torchtune/pull/1954