huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
651 stars 35 forks source link

`qint4` failing with PixArt Transformer #228

Open sayakpaul opened 2 days ago

sayakpaul commented 2 days ago

Install diffusers first.

And then do:

from diffusers import DiffusionPipeline
from optimum.quanto import quantize, freeze, qint4
import torch 

ckpt_id = "ptx0/pixart-900m-1024-ft"
torch_dtype = torch.float16
pipe = DiffusionPipeline.from_pretrained(ckpt_id, torch_dtype=torch_dtype).to("cuda")
if transformer_dtype != "none":
    quantize(pipe.transformer, weights=qint4)
    freeze(pipe.transformer)

I am on the HF DGX. My PyTorch version is 2.3.1. I installed quanto from main.

Getting:

__torch_function__
    return qfunc(*args, **kwargs)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/optimum/quanto/tensor/qtensor_func.py", line 142, in linear
    return QTensorLinear.apply(input, other, bias)
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/sayak/.pyenv/versions/diffusers/lib/python3.10/site-packages/optimum/quanto/tensor/qtensor_func.py", line 120, in forward
    output = output + bias
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Cc: @dacorvo @SunMarc

dacorvo commented 2 days ago

I think it might be related to the same issue @SunMarc had: the int4 kernels have not been compiled because one of the devices on your host has a CUDA arch that is lower than sm80. Can you try with the fix I just pushed ?

dacorvo commented 2 days ago

Might be fixed by #227

sayakpaul commented 1 day ago

No, it doesn't :(

sayakpaul commented 1 day ago

Here's my nvcc -V:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

nvidia-smi:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-80GB          On  | 00000000:01:00.0 Off |                    0 |
| N/A   49C    P0              92W / 275W |   1690MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-80GB          On  | 00000000:47:00.0 Off |                    0 |
| N/A   50C    P0              93W / 275W |      8MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-80GB          On  | 00000000:81:00.0 Off |                    0 |
| N/A   49C    P0              94W / 275W |      8MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA DGX Display             On  | 00000000:C1:00.0 Off |                  N/A |
| 34%   37C    P8              N/A /  50W |      3MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-80GB          On  | 00000000:C2:00.0 Off |                    0 |
| N/A   50C    P0              98W / 275W |      8MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    836338      C   ...iniconda3/envs/parlertts/bin/python     1676MiB |
+---------------------------------------------------------------------------------------+
dacorvo commented 1 day ago

@sayakpaul can you try uninstalling then installing optimum-quanto, just to make sure there is no obsolete cached extension ?

sayakpaul commented 1 day ago

Yeah did that too but still failing @dacorvo