Closed a-r-r-o-w closed 3 months ago
Surprisingly enough, moving the pipeline to cuda first, and then performing fp8 quantization fixes the above issue. However, I now get the following error:
This makes sense because the machines I'm using have CUDA capability 8 and lower. @sayakpaul Would it be possible to run these benchmarks on a different machine, or do we just skip the ones that fail?
Hmm, an easier solution would be to test it on an H100.
It was indeed the case that A100 didn't support the fp8 quantization here (having cuda capability lower than required as mentioned in the error). Works perfectly fine on an H100
It seems that the device argument cannot be passed whether you pass it as an arg or kwarg. Any workarounds or suggestions if I'm doing something wrong?
Reproducer
```python import torch from diffusers import AutoencoderKLCogVideoX, CogVideoXPipeline, CogVideoXTransformer3DModel, CogVideoXDDIMScheduler from diffusers.utils import export_to_video from transformers import T5EncoderModel from torchao.float8.inference import ActivationCasting, QuantConfig, quantize_to_float8 model_id = "THUDM/CogVideoX-2b" device = "cuda" # 1. Load models text_encoder = T5EncoderModel.from_pretrained(model_id, subfolder="text_encoder", torch_dtype=torch.bfloat16) transformer = CogVideoXTransformer3DModel.from_pretrained( model_id, subfolder="transformer", torch_dtype=torch.bfloat16 ) vae = AutoencoderKLCogVideoX.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.bfloat16) # 2. Quantize transformer = quantize_to_float8(transformer, QuantConfig(ActivationCasting.DYNAMIC)) # 3. Load pipeline pipe = CogVideoXPipeline.from_pretrained( model_id, text_encoder=text_encoder, transformer=transformer, vae=vae, torch_dtype=torch.bfloat16, ) pipe.scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing") pipe.set_progress_bar_config(disable=True) pipe.to(device=device) # <--- Fails here prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) video = pipe( prompt=prompt, guidance_scale=6, num_inference_steps=50, generator=torch.Generator().manual_seed(3047), # https://arxiv.org/abs/2109.08203 ) export_to_video(video.frames[0], "output.mp4", fps=8) ```Traceback
```python Traceback (most recent call last): File "/home/aryan/work/diffusers/workflows/experiments/cogvideox-torchao/reproducer.py", line 35, incc @sayakpaul @jerryzh168