import quanto
quanto.quantize(model, weights=quanto.qint8, activations=quanto.qint8)
with quanto.Calibration(momentum=0.9):
for inputs, labels in reg_dataloader:
inputs = inputs.to('cuda')
labels = labels.to('cuda')
model(inputs)
it works
but when I replace qint4 with qint8 like:
import quanto
quanto.quantize(model, weights=quanto.qint4, activations=quanto.qint8)
with quanto.Calibration(momentum=0.9):
for inputs, labels in reg_dataloader:
inputs = inputs.to('cuda')
labels = labels.to('cuda')
model(inputs)
it output an long errors. its interestingt that the model can work well, but the time it cost doubled.
my enviroments:
CUDA Version: 12.2
Python 3.11.9
Package Version
when I use:
it works but when I replace qint4 with qint8 like:
it output an long errors. its interestingt that the model can work well, but the time it cost doubled.
my enviroments: CUDA Version: 12.2 Python 3.11.9 Package Version
contourpy 1.2.1 cycler 0.12.1 filelock 3.13.1 fonttools 4.53.0 fsspec 2024.2.0 Jinja2 3.1.3 kiwisolver 1.4.5 MarkupSafe 2.1.5 matplotlib 3.9.0 mpmath 1.3.0 networkx 3.2.1 ninja 1.11.1.1 numpy 1.26.3 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.1.105 nvidia-nvtx-cu12 12.1.105 optimum-quanto 0.2.1 packaging 24.0 pillow 10.2.0 pip 24.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 quanto 0.2.0 safetensors 0.4.3 setuptools 69.5.1 six 1.16.0 sympy 1.12 torch 2.3.0+cu121 torchaudio 2.3.0+cu121 torchvision 0.18.0+cu121 triton 2.3.0 typing_extensions 4.9.0 wheel 0.43.0
the hole code:
the hole errors error_content.txt