Closed sayakpaul closed 3 months ago
Interesting, I was able to run your benchmark script on today. Benchmark worked with dynamic and showed a slight speed up, still need to look at the traces.
I was able to reproduce your error with static quant.
Could you try with the latest benchmark code? I am still hitting this error with latest nightly and latest ao
yeah will try
On Pixart Bsz 1 I am getting: Compile + static quant FP8 Quant:
| ckpt_id | batch_size | fuse | compile | quantization | sparsify | memory | time |
|:--------------------------------------:|-------------:|:------:|:---------:|:--------------:|:----------:|---------:|-------:|
| PixArt-alpha/PixArt-Sigma-XL-2-1024-MS | 1 | False | True | fp8 | False | 9.672 | 1.242 |
Compile:
| ckpt_id | batch_size | fuse | compile | quantization | sparsify | memory | time |
|:--------------------------------------:|-------------:|:------:|:---------:|:--------------:|:----------:|---------:|-------:|
| PixArt-alpha/PixArt-Sigma-XL-2-1024-MS | 1 | False | True | None | False | 10.211 | 1.353 |
Diff:
diff --git a/inference/benchmark_pixart.py b/inference/benchmark_pixart.py
index 64e0ff9..353813e 100644
--- a/inference/benchmark_pixart.py
+++ b/inference/benchmark_pixart.py
@@ -82,8 +82,8 @@ def load_pipeline(
quantize_(pipeline.transformer, fp6_llm_weight_only())
quantize_(pipeline.vae, fp6_llm_weight_only())
elif quantization == "fp8":
- pipeline.transformer = quantize_to_float8(pipeline.transformer, QuantConfig(ActivationCasting.DYNAMIC))
- pipeline.vae = quantize_to_float8(pipeline.vae, QuantConfig(ActivationCasting.DYNAMIC))
+ pipeline.transformer = quantize_to_float8(pipeline.transformer, QuantConfig(ActivationCasting.STATIC, torch.tensor([1.0], dtype=torch.float32, device="cuda")))
+ # pipeline.vae = quantize_to_float8(pipeline.vae, QuantConfig(ActivationCasting.DYNAMIC), module_filter_fn=module_fn)
elif quantization == "autoquant":
pipeline.transformer = autoquant(pipeline.transformer)
pipeline.vae = autoquant(pipeline.vae)
The original error is from because we dont currently support bmm
Yeah, I was able to run with static FP8 quant before without quantizing the VAE, which is what you seem to be doing as well. That is known. Thanks for looking into it.
@drisspg getting an error when trying to run with dynamic fp8 quantization:
I am on torch latest nightly as well as latest torchao.
I am running this on H100.