huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
833 stars 62 forks source link

Is it GPU Compatability Issue? #353

Open kamrul-NSL opened 2 weeks ago

kamrul-NSL commented 2 weeks ago

from diffusers import PixArtSigmaPipeline
import torch

pipeline = PixArtSigmaPipeline.from_pretrained(
    "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torch_dtype=torch.float16
).to("cuda")

quantize(pipeline.transformer, weights=qfloat8)
freeze(pipeline.transformer)```

Here I am getting this error!

**RuntimeError: Error building extension 'quanto_cuda': [1/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output unpack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/fp8/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=860 -std=c++17 -c /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/unpack.cu -o unpack.cuda.o 
**
I am trying to integrate it using NVIDIA 3090 GPU.
CyberVy commented 5 hours ago

It works when the model is on CPU. However, after moving model to GPU, the CPU memory still does not descend.