[WIP] compile graph breaks

Context

What is the purpose of this PR? Is it to

[ ] add a new feature
[x] fix a bug
[ ] update tests and/or documentation
[ ] other (please add here)

addressing compile graph breaks

Test plan

Rough numbers: 11b, compiling only decoder

1) without the changes,

28s on the first step 21s on the second step 0.9s any other step

2) with dynamic shape changes

~28s in the first step, ~ 0.9s any other step

3) further adding requires grad (i am not sure if this does anything to the loss) h = self.tok_embeddings(tokens) h.requires_grad = True

First step ~18s ~ 0.9s any other step

So we save ~21s to 31s of compilation time by avoiding the graph breaks due to shape plus optionally adding requires_grad. This is without compiling the encoder, which NaNs if we do (Brian will look into that).

TODO: Need to check if there is a perf impact when packed=True and there wont be graph breaks.

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2027

:page_facing_up: Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

:heavy_exclamation_mark: 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

:x: 3 New Failures, 6 Cancelled Jobs

As of commit b41114ad3631c8409c8c2409755e70acbf66e016 with merge base d5c54f376ad946060a84a84c40d9deba01d363f0 ():

NEW FAILURES - The following jobs have failed:

* [GPU tests / gpu_test (3.10, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358443) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976074/job/33230358443)) `tests/torchtune/modules/test_transformer_decoder.py::TestTransformerDecoder::test_kv_cache_batch_size_exceeded` * [Recipe Tests / recipe_test (3.11)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358793) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976072/job/33230358793)) `tests/recipes/test_ppo_full_finetune_single_device.py::TestPPOFullFinetuneSingleDeviceRecipe::test_training_state_on_resume_with_optimizer_in_bwd` * [Unit Test / unit_tests (3.9)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230357795) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976075/job/33230357795)) `tests/torchtune/modules/test_transformer_decoder.py::TestTransformerDecoder::test_kv_cache_batch_size_exceeded`

CANCELLED JOBS - The following jobs were cancelled. Please retry:

* [GPU tests / gpu_test (3.11, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358772) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976074/job/33230358772)) `##[error]The operation was canceled.` * [GPU tests / gpu_test (3.9, stable)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230357820) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976074/job/33230357820)) `##[error]The operation was canceled.` * [Recipe Tests / recipe_test (3.10)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358455) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976072/job/33230358455)) `##[error]The operation was canceled.` * [Recipe Tests / recipe_test (3.9)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230357876) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976072/job/33230357876)) `##[error]The operation was canceled.` * [Unit Test / unit_tests (3.10)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358459) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976075/job/33230358459)) `##[error]The operation was canceled.` * [Unit Test / unit_tests (3.11)](https://hud.pytorch.org/pr/pytorch/torchtune/2027#33230358762) ([gh](https://github.com/pytorch/torchtune/actions/runs/11922976075/job/33230358762)) `##[error]The operation was canceled.`

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch / torchtune