Open kamrul-NSL opened 2 weeks ago
from diffusers import PixArtSigmaPipeline import torch pipeline = PixArtSigmaPipeline.from_pretrained( "PixArt-alpha/PixArt-Sigma-XL-2-1024-MS", torch_dtype=torch.float16 ).to("cuda") quantize(pipeline.transformer, weights=qfloat8) freeze(pipeline.transformer)``` Here I am getting this error! **RuntimeError: Error building extension 'quanto_cuda': [1/7] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output unpack.cuda.o.d -DTORCH_EXTENSION_NAME=quanto_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include/TH -isystem /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/torch/include/THC -isystem /home/user/anaconda3/envs/fp8/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' --expt-extended-lambda --use_fast_math -DQUANTO_CUDA_ARCH=860 -std=c++17 -c /home/user/anaconda3/envs/fp8/lib/python3.10/site-packages/optimum/quanto/library/extensions/cuda/unpack.cu -o unpack.cuda.o ** I am trying to integrate it using NVIDIA 3090 GPU.
It works when the model is on CPU. However, after moving model to GPU, the CPU memory still does not descend.