Open felipemello1 opened 3 days ago
Note: Links to docs will display an error until the docs builds have been completed.
There are 1 currently active SEVs. If your PR is affected, please view them below:
As of commit b41114ad3631c8409c8c2409755e70acbf66e016 with merge base d5c54f376ad946060a84a84c40d9deba01d363f0 ():
* [GPU tests / gpu_test (3.10, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358443) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976074/job/33230358443)) `tests/torchtune/modules/test_transformer_decoder.py::TestTransformerDecoder::test_kv_cache_batch_size_exceeded` * [Recipe Tests / recipe_test (3.11)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358793) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976072/job/33230358793)) `tests/recipes/test_ppo_full_finetune_single_device.py::TestPPOFullFinetuneSingleDeviceRecipe::test_training_state_on_resume_with_optimizer_in_bwd` * [Unit Test / unit_tests (3.9)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230357795) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976075/job/33230357795)) `tests/torchtune/modules/test_transformer_decoder.py::TestTransformerDecoder::test_kv_cache_batch_size_exceeded`
* [GPU tests / gpu_test (3.11, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358772) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976074/job/33230358772)) `##[error]The operation was canceled.` * [GPU tests / gpu_test (3.9, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230357820) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976074/job/33230357820)) `##[error]The operation was canceled.` * [Recipe Tests / recipe_test (3.10)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358455) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976072/job/33230358455)) `##[error]The operation was canceled.` * [Recipe Tests / recipe_test (3.9)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230357876) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976072/job/33230357876)) `##[error]The operation was canceled.` * [Unit Test / unit_tests (3.10)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358459) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976075/job/33230358459)) `##[error]The operation was canceled.` * [Unit Test / unit_tests (3.11)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358762) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976075/job/33230358762)) `##[error]The operation was canceled.`
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Context
What is the purpose of this PR? Is it to
addressing compile graph breaks
Test plan
Rough numbers: 11b, compiling only decoder
1) without the changes,
28s on the first step 21s on the second step 0.9s any other step
2) with dynamic shape changes
~28s in the first step, ~ 0.9s any other step
3) further adding requires grad (i am not sure if this does anything to the loss) h = self.tok_embeddings(tokens) h.requires_grad = True
First step ~18s ~ 0.9s any other step
So we save ~21s to 31s of compilation time by avoiding the graph breaks due to shape plus optionally adding requires_grad. This is without compiling the encoder, which NaNs if we do (Brian will look into that).
TODO: Need to check if there is a perf impact when packed=True and there wont be graph breaks.