pytorch / ao

PyTorch native quantization and sparsity for training and inference
BSD 3-Clause "New" or "Revised" License
1.6k stars 179 forks source link

[wip] unbreak float8 + delayed + compile + AC #1331

Open vkuzo opened 3 days ago

vkuzo commented 3 days ago

Summary:

broken in torchtitan, debugging the local repro first...

Test Plan:

pytest test/float8/test_compile.py -s -x
// error: https://gist.github.com/vkuzo/b6a5d0f828203203657d2e7188dc6393

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot[bot] commented 3 days ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1331

Note: Links to docs will display an error until the docs builds have been completed.

:x: 5 New Failures

As of commit 061770a40a2b85868ce014c725a3681e99cc009c with merge base 7c3c51fd0de33307e43a1769883a348861d6f7c9 (image):

NEW FAILURES - The following jobs have failed:

* [Code Analysis with Ruff / build (3.9)](https://hud.pytorch.org/pr/pytorch/ao/1331#33403765976) ([gh](https://github.com/pytorch/ao/actions/runs/11980128683/job/33403765976)) `##[error]Process completed with exit code 1.` * [PR Label Check / Check PR Labels](https://hud.pytorch.org/pr/pytorch/ao/1331#33403767500) ([gh](https://github.com/pytorch/ao/actions/runs/11980129121/job/33403767500)) `##[error]This PR requires at least one label starting with 'topic:'. Available topics can be found at: https://github.com/pytorch/ao/labels?q=topic` * [Run Float8 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job](https://hud.pytorch.org/pr/pytorch/ao/1331#33403766397) ([gh](https://github.com/pytorch/ao/actions/runs/11980128698/job/33403766397)) `RuntimeError: Command docker exec -t 01a20db1a791793cf8806c56ef427ac8392edba9dad1ff286e82974aa8f8fdc7 /exec failed with exit code 1` * [Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job](https://hud.pytorch.org/pr/pytorch/ao/1331#33403768579) ([gh](https://github.com/pytorch/ao/actions/runs/11980128831/job/33403768579)) `RuntimeError: Command docker exec -t ebcb9f39f81bcd9d55f38cd8363a3d35495b1d3b46f5b949b1da366788ef3a35 /exec failed with exit code 1` * [Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job](https://hud.pytorch.org/pr/pytorch/ao/1331#33403769091) ([gh](https://github.com/pytorch/ao/actions/runs/11980128831/job/33403769091)) `RuntimeError: Command docker exec -t cc9156e8b9a53b5b3f63ae13d8526755e8fee3a7a6c1f8a52510c361331e9db9 /exec failed with exit code 1`

This comment was automatically generated by Dr. CI and updates every 15 minutes.