Open BenjaminBossan opened 2 months ago
I tried on an AWS A10 and could not reproduce the issue.
I was afraid that it could be something device or driver specific. Did you also try on CPU?
It is probably related to https://github.com/pytorch/pytorch/issues/112024. I am not sure why it does not trigger the error on my setup.
Could be. Well, feel free to close, I can work around that, just wanted to bring it up in case there is something that can be done in quanto.
I don't know: maybe it can be worked around, perhaps with the help of @ezyang or @alband. Let's keep it open for now.
try a pytorch nightly if you can
Thanks, I tried torch 2.5.0 nightly and the error did indeed go away (both CUDA and CPU). Then I went back to my previous env that I used to check the error and I could not reproduce it anymore. Using torch 2.5.0 triggered a recompilation, so not sure if that's why or if there was another reason.
Anyway, I'll close for now, will re-open when I run into the error again. Thanks everyone for the help.
I'm running again into this issue with the released torch 2.5.0 version. Reproducer:
import torch
from transformers import QuantoConfig, AutoModelForCausalLM
quant_config = QuantoConfig("int2")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m", quantization_config=quant_config)
inputs = torch.arange(10).view(-1, 1)
with torch.inference_mode():
model(inputs)
Using a fresh environment with
$ pip freeze | rg "torch|transformers|accelerate|optimum"
accelerate==1.0.1
optimum-quanto==0.2.5
torch==2.5.0
transformers==4.46.0
I'm getting an unexpected error when running inference with a quanto-quantized model. I've installed optimum-quanto from main (
e7011ab94ea5a002019e6aa9a0b1e2a37e8eed35
). Reproducer:This results in:
Full error
``` /home/name/anaconda3/envs/peft/lib/python3.11/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( /home/name/work/forks/optimum-quanto/optimum/quanto/library/ops.py:66: UserWarning: An exception was raised while calling the optimized kernel for quanto::unpack: /home/name/anaconda3/envs/peft/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /home/name/work/forks/optimum-quanto/optimum/quanto/library/extensions/cuda/build/quanto_cuda.so) Falling back to default implementation. warnings.warn(message + " Falling back to default implementation.") Traceback (most recent call last): File "/home/name/work/forks/peft/foo.py", line 17, inThis is on an NVIDIA RTX 4090, with:
but the same error occurs on CPU.