huggingface / optimum-quanto

A pytorch quantization backend for optimum
Apache License 2.0
645 stars 36 forks source link

optimized kernel for quanto::dqmm not found #203

Open kechan opened 1 month ago

kechan commented 1 month ago

UserWarning: An exception was raised while calling the optimized kernel for quanto::dqmm: /home/jupyter/.cache/torch_extensions/py38_cu121/quanto_cpp/quanto_cpp.so: cannot open shared object file: No such file or directory Falling back to default implementation.

I got this while trying this on a google gcp vm with:

torch 2.3 quanto 0.2.0 accelerate 0.30.1

I am not sure if this is the reason why my qbit8 quantized model run slower than the original model.

HackHerz commented 2 weeks ago

I have a simillar warning /opt/homebrew/lib/python3.12/site-packages/quanto/library/ops.py:66: UserWarning: An exception was raised while calling the optimized kernel for quanto::dqmm: Unsupported TypeMeta in ATen: Falling back to default implementation.

Since upgrading pytorch this morning. Increased my scripts runtime from 8s to 2min.

kechan commented 1 week ago

@HackHerz I was told elsewhere the execution correctness may still be ok due to fallback, but it may cause unacceptable latency hit.

dacorvo commented 4 days ago

Should be fixed in 0.2.2